W!o+ 的《小伶鼬工坊演義》︰神經網絡【Sigmoid】一

既然我們即將用到『 S 函數』,何不借著樹莓派上之 Mathematica 多了解它的性質的呢?此處僅直接引用 wolfram 網頁上之文本︰

Sigmoid Function

DOWNLOAD Mathematica Notebook SigmoidFunction

SigmoidReImAbs
Min Max
Re
Im Powered by webMathematica

The sigmoid function, also called the sigmoidal curve (von Seggern 2007, p. 148) or logistic function, is the function

 y=1/(1+e^(-x)).
(1)

It has derivative

(dy)/(dx) = [1-y(x)]y(x)
(2)
= (e^(-x))/((1+e^(-x))^2)
(3)
= (e^x)/((1+e^x)^2)
(4)

and indefinite integral

intydx = x+ln(1+e^(-x))
(5)
= ln(1+e^x).
(6)

It has Maclaurin series

y(x) = sum_(n=0)^(infty)((-1)^nE_n(0))/(2n!)x^n
(7)
= sum_(n=0)^(infty)((-1)^(n+1)(2^(n+1)-1)B_(n+1))/((n+1))x^n
(8)
= 1/2+1/4x-1/(48)x^3+1/(480)x^5-(17)/(80640)x^7+(31)/(1451520)x^9-...,
(9)

where E_n(x) is an Euler polynomial and B_n is a Bernoulli number.

It has an inflection point at x=0, where

 y^('')(x)=-(e^x(e^x-1))/((e^x+1)^3)=0.
(10)

It is also the solution to the ordinary differential equation

 (dy)/(dx)=y(1-y)
(11)

with initial condition y(0)=1/2.

 

如同維基百科裡詞條之所言,『 S 函數』 Sigmoid function 是一類由『微分方程式』確立的曲線族︰

Properties

In general, a sigmoid function is real-valued and differentiable, having either a non-negative or non-positive first derivative[citation needed] which is bell shaped. There are also a pair of horizontal asymptotes as t \rightarrow \pm \infty. The differential equation  \tfrac{\mathrm{d}}{\mathrm{d}t} S(t) = c_1 S(t) \left( c_2 - S(t) \right), with the inclusion of a boundary condition providing a third degree of freedom, c_3, provides a class of functions of this type.

The logistic function has this further, important property, that its derivative can be expressed by the function itself,

S'(t)= S(t)(1-S(t)).

 

由於『非負』或『非正』之『斜率』特性,『 S 函數』族的曲線只會單調『上升』或『下降』,只在無窮遠 \infty 處與兩條邊界線相遇。它的『緩始』、『速變』、『漸成』特性,方使得它很利於描述『學習過程』的嗎!

Examples

Many natural processes, such as those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time. When a detailed description is lacking, a sigmoid function is often used[2] .

Besides the logistic function, sigmoid functions include the ordinary arctangent, the hyperbolic tangent, the Gudermannian function, and the error function, but also the generalised logistic function and algebraic functions like f(x)=\tfrac{x}{\sqrt{1+x^2}}.

The integral of any smooth, positive, “bump-shaped” function will be sigmoidal, thus the cumulative distribution functions for many common probability distributions are sigmoidal. The most famous such example is the error function, which is related to the cumulative distribution function (CDF) of a normal distribution.

700px-Gjl-t(x).svg

Some sigmoid functions compared. In the drawing all functions are normalized in such a way that their slope at the origin is 1.

───

 

這或許足以了解 Michael Nielsen 對『 S 神經元』所作的說明︰

Okay, let me describe the sigmoid neuron. We’ll depict sigmoid neurons in the same way we depicted perceptrons:

 

Just like a perceptron, the sigmoid neuron has inputs, x_1, x_2,\cdots . But instead of being just 0 or 1, these inputs can also take on any values between 0 and 1. So, for instance, 0.683 \cdots is a valid input for a sigmoid neuron. Also just like a perceptron, the sigmoid neuron has weights for each input, w_1, w_2,\cdots , and an overall bias, b. But the output is not 0 or 1. Instead, it’s \sigma(w \cdot x+b), where \sigma is called the sigmoid function*

*Incidentally, σ is sometimes called the logistic function, and this new class of neurons called logistic neurons. It’s useful to remember this terminology, since these terms are used by many people working with neural nets. However, we’ll stick with the sigmoid terminology.

, and is defined by:

\sigma(z) \equiv \frac{1}{1+e^{-z}}. \ \ \ \ \ (3)

 

To put it all a little more explicitly, the output of a sigmoid neuron with inputs x_1, x_2,\cdots , weights w_1, w_2,\cdots , and bias b is

\frac{1}{1+\exp(-\sum_j w_j x_j-b)}. \ \ \ \ \ (4)

 

At first sight, sigmoid neurons appear very different to perceptrons. The algebraic form of the sigmoid function may seem opaque and forbidding if you’re not already familiar with it. In fact, there are many similarities between perceptrons and sigmoid neurons, and the algebraic form of the sigmoid function turns out to be more of a technical detail than a true barrier to understanding.

To understand the similarity to the perceptron model, suppose z \equiv w \cdot x + b is a large positive number. Then e^{-z} \approx 0 and so \sigma(z) \approx 1. In other words, when z = w \cdot x+b is large and positive, the output from the sigmoid neuron is approximately 1, just as it would have been for a perceptron. Suppose on the other hand that z = w \cdot x+b is very negative. Then e^{-z} \rightarrow \infty, and \sigma(z) \approx 0. So when z = w \cdot x+b is very negative, the behaviour of a sigmoid neuron also closely approximates a perceptron. It’s only when w \cdot x+b is of modest size that there’s much deviation from the perceptron model.

───

 

若是多點時間玩味這個最簡表達式

\sigma(z \equiv w \cdot x + b) = \frac{1}{1+e^{-(w \cdot x + b)}}

你將會發現

w 越大,『 S 曲線』越陡,越像 threshold 的形狀。反之則越平直化。

w 正、負變號,那麼『 S 曲線』將左、右反轉。

b 決定『中點』 \frac{1}{2} 之位置。

‧……

 

難道這些就是『 S 神經元』容易『教化』的原因嗎??