W!o+ 的《小伶鼬工坊演義》︰神經網絡【Sigmoid】七

Michael Nielsen 先生所講的這段結論︰

Let me conclude this section by discussing a point that sometimes bugs people new to gradient descent. In neural networks the cost C is, of course, a function of many variables – all the weights and biases – and so in some sense defines a surface in a very high-dimensional space. Some people get hung up thinking: “Hey, I have to be able to visualize all these extra dimensions”. And they may start to worry: “I can’t think in four dimensions, let alone five (or five million)”. Is there some special ability they’re missing, some ability that “real” supermathematicians have? Of course, the answer is no. Even most professional mathematicians can’t visualize four dimensions especially well, if at all. The trick they use, instead, is to develop other ways of representing what’s going on. That’s exactly what we did above: we used an algebraic (rather than visual) representation of \Delta C to figure out how to move so as to decrease C. People who are good at thinking in high dimensions have a mental library containing many different techniques along these lines; our algebraic trick is just one example. Those techniques may not have the simplicity we’re accustomed to when visualizing three dimensions, but once you build up a library of such techniques, you can get pretty good at thinking in high dimensions. I won’t go into more detail here, but if you’re interested then you may enjoy reading this discussion of some of the techniques professional mathematicians use to think in high dimensions. While some of the techniques discussed are quite complex, much of the best content is intuitive and accessible, and could be mastered by anyone.

───

 

讓作者想起『二維生物』的故事。它們生活在一個很大的『球面』上,那麼它們可能『發現』那不是『平面』的嗎?或是它們也能用『三維』空間概念來表現的呢??若問『想像』一個『超曲面』

S(x_1, x_2, x_3, \cdots) = c

困難,還是理解伽羅瓦之『能想』困難︰

Permutations_RGB.svg

220px-15-Puzzle

220px-Symmetric_group_3;_Cayley_table;_matrices.svg

220px-Permutations_with_repetition.svg

就讓我們略窺一下『伽羅瓦』的思考法吧。假使 x_1,x_2,\cdots, x_n 是『多項式P(x) = \sum \limits_{k=0}^{k=n} c_k x^k = 0 的『』,此處係數 c_k 都是『有理數』。如果我們建構一個『對稱函數

f(x_1,x_2,x_3,\cdots,x_n) = (x-x_1)(x-x_2)(x-x_3)\cdots(x-x_n)

,將它展開後 c_n f(x_1,x_2,x_3,\cdots,x_n) 應該就是 P(x) 的吧。如果將這些『x_1,x_2,\cdots, x_n,作任意的『排列』permutation \begin{pmatrix} x_1 & x_2 & x_3 & \cdots & x_n \\ x_2 & x_n & x_4 & \cdots & x_1\end{pmatrix},此處是說上一排的『』的『位置』用下一排的『』來『置換』,由於 f 函數的特殊『形式』,我們會得到 f(x_1,x_2,x_3, \cdots,x_n)=f(x_2,x_n,x_4,\cdots,x_1)。事實上對於任意的『置換』,都會有 f(x_1,x_2,...,x_n)=f(x_2,x_1,\cdots,x_n)=f(x_3,x_1,\cdots,x_n,x_{n−1})。所以函數 f 稱之為『對稱函數』,這個『置換』的『不變性』就是『伽羅瓦』 主要研究的對象。舉例來說,考慮一個二次方程式 x^2 + A x + B = 0 有兩個根 \lambda_1, \lambda_2F(\lambda_1, \lambda_2) = (x - \lambda_1)(x - \lambda_2)
= x^2 -(\lambda_1 + \lambda_2) x +  \lambda_1 \cdot \lambda_2
= x^2 + A x + B
,比對後得到
\lambda_1 + \lambda_2 = -A,和
\lambda_1 \cdot \lambda_2 = B
。這兩個『代數式』對於 \lambda_1, \lambda_2 來講,也是『對稱的』,如果將它們看成兩個變數的『聯立方程組』,化簡後所得到的也定然就是『對等的』二次方程式 {\lambda_1}^2 + A \lambda_1 + B = 0{\lambda_2}^2 + A \lambda_2 + B = 0

這產生了很重要的結果,假使 \lambda_1 = a + b \sqrt{Q} 是方程式的一個根,假設另一個根是 \lambda_2 = c + d \sqrt{Q},由於
\lambda_1 + \lambda_2 = -A
\Longrightarrow  (a + b \sqrt{Q}) + (c + d \sqrt{Q}) = -A
\Longrightarrow  (a + c +A) + (b + d) \sqrt{Q} = 0
\therefore a + c + A = 0, \ b + d =0
,再由
\lambda_1 \cdot \lambda_2 = B
\Longrightarrow  (a + b \sqrt{Q}) \cdot (c + d \sqrt{Q}) = B
\Longrightarrow (a c + b d Q - B) + (a d + b c) \sqrt{Q}) =0
\therefore a c + b d Q = B, \ a d + b c =0
。因此得到 d = -b, \ c = a。而且 a = - \frac{A}{2}, \ b \sqrt{Q} = \frac{\sqrt{A^2 - 4B}}{2}。也就是說這兩個根是熟悉的 \frac{- A + \sqrt{A^2 - 4B}}{2}\frac{- A - \sqrt{A^2 - 4B}}{2}。於是一個『對稱函數f(x_1,x_2,x_3,\cdots,x_n) = (x-x_1)(x-x_2)(x-x_3)\cdots(x-x_n) 如果某一個根 x_ka + b \sqrt{Q} 的形式﹐那麼必然有另一個根 x_ja - b \sqrt{Q} 的形式,這就是由於那個『多項式』的係數是『有理數』的原故,它的『二次方根』的解,總是『成對』出現的啊!於是『二次方根』解的個數也必然是『偶數』的了!!

如 果我們探討一個三次方程式 x^3 + A x^2 + B x + C = 0 有三個根 \lambda_1, \lambda_2, \lambda_3 的情況,此時 F(\lambda_1, \lambda_2, \lambda_3) = (x - \lambda_1)(x - \lambda_2)(x - \lambda_3),假使說 \lambda_1, \lambda_2 是一對『二次方根』的解,那麼 \lambda_3 就必然是『有理數』。而 且從 x \approx +\infty \Longrightarrow F(\lambda_1, \lambda_2, \lambda_3) \approx +\inftyx \approx -\infty \Longrightarrow F(\lambda_1, \lambda_2, \lambda_3) \approx -\infty 來看,三次方程式至少有一個實數解。如果我們用 x = z - \frac{A}{3}  來消去 A x^2 這個『平方項

{(z - \frac{A}{3})}^3 + A {(z - \frac{A}{3})}^2 + B {(z - \frac{A}{3})} + C

= \left( z^3 - z^2 A + \frac{z A^2}{3} - \frac{A^3}{27} \right) + A \left( z^2 - \frac{2 z A}{3} + \frac{A^2}{9} \right) + B \left( z - \frac{A}{3} \right) + C

= z^3 + \left(- \frac{A^2}{3} + B \right) z + \left( \frac{2 A^3}{27} - \frac{B A}{3} + C \right)

\equiv_{df} z^3 + p z + q

。那麼為什麼要消去『平方項』的呢?如果考察

F(\lambda_1, \lambda_2, \lambda_3) = (x - \lambda_1)(x - \lambda_2)(x - \lambda_3)

= x^3 - (\lambda_1 + \lambda_2 + \lambda_3) x^2 + \left[\lambda_1 \cdot \lambda_2  + \lambda_3 \cdot (\lambda_1 + \lambda_2) \right]  x - \lambda_1 \lambda_2 \lambda_3

,當 『平方項』為『』時,- (\lambda_1 + \lambda_2 + \lambda_3) = 0,這建議著 \lambda_3 = (- \lambda_1) + (- \lambda_2) =  u + v,也就是說有一個根可以表示成『特殊兩數』之和。假使我們將 z = u + v 代入方程式,得到

(u + v)^3 + p(u + v) + q = 0

= (u^3 + v^3 + q) + (u + v)(3uv + p) = 0,假使『選擇3uv + p = 0,又可以得到 u^3 + v^3 + q = 0,是這一組 u,v 所滿足的『聯立方程式』,可以將它改寫成

uv = - \frac{p}{3} \Longrightarrow u^3 \cdot v^3 = - \frac{p^3}{27},與

u^3 + v^3 = -q ,這卻正是說 u^3, v^3 是一個『二次方程式』的根。求解後可以得到

u^{3}=-{q\over 2} + \sqrt{{q^{2}\over 4}+{p^{3}\over 27}}, \  v^{3}=-{q\over 2} - \sqrt{{q^{2}\over 4}+{p^{3}\over 27}}

, 所以 z=u+v=\sqrt[3]{-{q\over 2}+ \sqrt{{q^{2}\over 4}+{p^{3}\over 27}}} +\sqrt[3]{-{q\over 2}- \sqrt{{q^{2}\over 4}+{p^{3}\over 27}}}。然而三次方程式不是應該有三個解的嗎?假使 \frac{q^2}{4}+\frac{p^3}{27} >0,那個 『根號\sqrt{{q^{2}\over 4}+{p^{3}\over 27}}  是正實數,因此 u^3, v^3 也是實數,而 z 是兩個『立方根』 之和,所以也是實數。那麼要如何求得另外兩個解的呢?假使設想 y^3 = \alpha = y^3 \cdot w^3w^3 = 1,然而 w^3 - 1 = (w - 1)(w^2 + w +1) = 0,所以可以解得 w = 1, \ w = \omega = \frac{-1 + i \sqrt{3}}{2}, \  w = {\omega}^2 = \frac{-1 - i \sqrt{3}}{2},於是 y 的三個解是 \sqrt[3]{\alpha}, \sqrt[3]{\alpha} \ \omega, \sqrt[3]{\alpha} \  {\omega}^2,最終我們可以得到 (u, v)(u \ \omega, v \  { \omega}^2),以及 (u \ {\omega}^2, v \ \omega) 三組答案,現今人們將這個方法叫做『卡爾達諾法』。『吉羅拉莫‧卡爾達諾』 Girolamo Cardano 是意大利文藝復興時期百科全書式的學者,主要成就在數學、物理、醫學方面。在一五四五年出版的《大術》一書中,他首先發表了三次方程式的一般解法。然而就數學史而言,真正發現此三次代數方程式解法的或許是『尼科洛‧塔塔利亞』 Niccolò Tartaglia,兩人也因此而結怨。這本書中還記載了四次代數方程的一般解法,其實是由他的學生『費拉里』所發現的?這簡直是特別為那個『實驗哲學』  x-phi 所舉的例子的吧!!

── 摘自《【Sonic π】電路學之補充《四》無窮小算術‧中下下‧中

 

總是見仁見智的吧!!

到底是誰?為什麼會開始將 28 \times 28 = 784 看成『行向量』的耶?所謂『手寫阿拉伯數字』難道不是『二維圖象』的乎??