且讓我們再次回顧『感知器』之模型︰
Perceptrons
What is a neural network? To get started, I’ll explain a type of artificial neuron called a perceptron. Perceptrons were developed in the 1950s and 1960s by the scientist Frank Rosenblatt, inspired by earlier work by Warren McCulloch and Walter Pitts. Today, it’s more common to use other models of artificial neurons – in this book, and in much modern work on neural networks, the main neuron model used is one called the sigmoid neuron. We’ll get to sigmoid neurons shortly. But to understand why sigmoid neurons are defined the way they are, it’s worth taking the time to first understand perceptrons.
So how do perceptrons work? A perceptron takes several binary inputs, , and produces a single binary output:
That’s all there is to how a perceptron works!
……
如果將 看成一個『超平面』,那麼 就是『法向量』垂直這個『超平面』。
因此一個『感知器』的『輸出』 值決定於『輸入』點 與此『超平面』【上】、【內、下】之關係。
由於『感知器』的『輸出』 是『離散的』 ,所以我們很難確定與 『鄰近』之『輸入』點 是否會有『相同』的『輸出』值?這要是針對『學習』而言,就是『法向量』與『閾 』 threshold 值的『微小』改變,恐將導致先前『學習結果』的混亂!
因此 Michael Nielsen 接續談及
Sigmoid neurons
Learning algorithms sound terrific. But how can we devise such algorithms for a neural network? Suppose we have a network of perceptrons that we’d like to use to learn to solve some problem. For example, the inputs to the network might be the raw pixel data from a scanned, handwritten image of a digit. And we’d like the network to learn weights and biases so that the output from the network correctly classifies the digit. To see how learning might work, suppose we make a small change in some weight (or bias) in the network. What we’d like is for this small change in weight to cause only a small corresponding change in the output from the network. As we’ll see in a moment, this property will make learning possible. Schematically, here’s what we want (obviously this network is too simple to do handwriting recognition!):
If it were true that a small change in a weight (or bias) causes only a small change in output, then we could use this fact to modify the weights and biases to get our network to behave more in the manner we want. For example, suppose the network was mistakenly classifying an image as an “8” when it should be a “9”. We could figure out how to make a small change in the weights and biases so the network gets a little closer to classifying the image as a “9”. And then we’d repeat this, changing the weights and biases over and over to produce better and better output. The network would be learning.
───
為什麼要用『 S 神經元』的呢?因為這個大名鼎鼎的『 S 函數』
Sigmoid function
A sigmoid function is a mathematical function having an “S” shape (sigmoid curve). Often, sigmoid function refers to the special case of the logistic function shown in the first figure and defined by the formula
Other examples of similar shapes include the Gompertz curve (used in modeling systems that saturate at large values of t) and the ogee curve (used in the spillway of some dams). A wide variety of sigmoid functions have been used as the activation function of artificial neurons, including the logistic and hyperbolic tangent functions. Sigmoid curves are also common in statistics as cumulative distribution functions, such as the integrals of the logistic distribution, the normal distribution, and Student’s t probability density functions.
───
早就為人所熟知!一段文本或許更能說明它與『感知器』 千絲萬縷之聯繫的吧!!??
一八三八年,比利時數學家 Pierre François Verhulst 發表了一個『人口成長』方程式,
,此處 是某時的人口數, 是自然成長率, 是環境承載力。求解後得到
,此處 是初始條件。 Verhulst 將這個函數稱作『logistic function』,於是那個微分方程式也就叫做『 logistic equation』。假使用 改寫成 ,將它『標準化』,取 與 ,從左圖的解答來看, ,也就是講人口數成長不可能超過環境承載力的啊!
如果求 的反函數,得到 ,這個反函數被稱之為『Logit』函數,定義為
,一般常用於『二元選擇』,比方說『To Be or Not To Be』的『機率分佈』,也用於『迴歸分析』 Regression Analysis 來看看兩個『變量』在統計上是『相干』還是『無干』的ㄡ!假使試著用『無窮小』 數來看 和 ,或許更能體會『兩極性』的吧!!
一九七六年,澳洲科學家 Robert McCredie May 發表了一篇《Simple mathematical models with very complicated dynamics》文章,提出了一個『單峰映象』 logistic map 遞迴關係式 。這個遞迴關係式很像是『差分版』的『 logistic equation』,竟然是產生『混沌現象』的經典範例。假使說一個『遞迴關係式』有『極限值』 的話,此時 ,可以得到 ,於是 或者 。在 之時,『單峰映象』或快或慢的收斂到『零』; 當 之時,它很快的逼近 ;於 之時,線性的上下震盪趨近 ;雖然 也收斂到 ,然而已經是很緩慢而且不是線性的了;當 時,對幾乎各個『初始條件』而言,系統開始發生兩值『震盪現象』,而後變成四值、八值、十六值…等等的『持續震盪』;最終於大約 時,這個震盪現象消失了,系統就步入了所謂的『混沌狀態』的了!!
『連續的』微分方程式沒有『混沌性』,『離散的』差分方程式反倒發生了『混沌現象』,那麼這個『量子』的『宇宙』到底是不是『混沌』的呢??回想之前『λ 運算』裡的『遞迴函式』,與數學中的『定點』定義,『單峰映象』可以看成函數 的『迭代求值』︰。當 ,這個 就是『定點』,左圖中顯示出不同的 值的求解現象,從有『定點』向『震盪』到『混沌』。如果我們將『 logistic equation』 改寫成 ,假使取 ,可以得到 ,它的『極限值』 ,根本與 沒有關係,這也就說明了兩者的『根源』是不同的啊!然而這卻建議著一種『時間序列』的觀點,如將 看成 ,這樣 就說是『速度』的了,於是 便構成了假想的『相空間』,這可就把一個『遞迴關係式』轉譯成了一種『符號動力學』的了!!
在某些特定的 值,這個『遞迴關係式』有『正確解』 exact solution,比方說 時,,因為 ,所以 ,於是 ,因此 。再者由於『指數項』 是『偶數』,所以此『符號動力系統』不等速 ── 非線性 ── 而且不震盪的逼近『極限值』的啊。
── 摘自《【Sonic π】電路學之補充《四》無窮小算術‧中下上》