W!o+ 的《小伶鼬工坊演義》︰神經網絡【Sigmoid】三

在開始探討『神經網絡』手寫阿拉伯數字辨識之前, Michael Nielsen 先生先介紹了它的主要『架構』以及使用之『術語』︰

The architecture of neural networks

In the next section I’ll introduce a neural network that can do a pretty good job classifying handwritten digits. In preparation for that, it helps to explain some terminology that lets us name different parts of a network. Suppose we have the network:

As mentioned earlier, the leftmost layer in this network is called the input layer, and the neurons within the layer are called input neurons. The rightmost or output layer contains the output neurons, or, as in this case, a single output neuron. The middle layer is called a hidden layer, since the neurons in this layer are neither inputs nor outputs. The term “hidden” perhaps sounds a little mysterious – the first time I heard the term I thought it must have some deep philosophical or mathematical significance – but it really means nothing more than “not an input or an output”. The network above has just a single hidden layer, but some networks have multiple hidden layers. For example, the following four-layer network has two hidden layers:

Somewhat confusingly, and for historical reasons, such multiple layer networks are sometimes called multilayer perceptrons or MLPs, despite being made up of sigmoid neurons, not perceptrons. I’m not going to use the MLP terminology in this book, since I think it’s confusing, but wanted to warn you of its existence.

───

 

如果從

圖 (數學)

數學上,一個Graph)是表示物件與物件之間的關係的方法,是圖論的基本研究對象。一個圖看起來是由一些小圓點(稱為頂點結點)和連結這些圓點的直線或曲線(稱為)組成的。

───

 

之『拓樸』觀點來看,所談『神經網絡』之『連接性』相當簡單。當真是比『知識網』,甚或『捷運網』都還容易︰

《 Simply Logical
Intelligent Reasoning by Example 》

之第三章開始處, Peter Flach 說︰

3 Logic Programming and Prolog
In the previous chapters we have seen how logic can be used to represent knowledge about a particular domain, and to derive new knowledge by means of logical inference. A distinct feature of logical reasoning is the separation between model theory and proof theory: a set of logical formulas determines the set of its models, but also the set of formulas that can be derived by applying inference rules. Another way to say the same thing is: logical formulas have both a declarative meaning and a procedural meaning. For instance, declaratively the order of the atoms in the body of a clause is irrelevant, but procedurally it may determine the order in which different answers to a query are found.

Because of this procedural meaning of logical formulas, logic can be used as a programming language. If we want to solve a problem in a particular domain, we write down the required knowledge and apply the inference rules built into the logic programming language. Declaratively, this knowledge specifies what the problem is, rather than how it should be solved. The distinction between declarative and procedural aspects of problem solving is succinctly expressed by Kowalski’s equation

algorithm = logic + control

Here, logic refers to declarative knowledge, and control refers to procedural knowledge. The equation expresses that both components are needed to solve a problem algorithmically.

In a purely declarative programming language, the programmer would have no means to express procedural knowledge, because logically equivalent programs would behave identical. However, Prolog is not a purely declarative language, and therefore the procedural meaning of Prolog programs cannot be ignored. For instance, the order of the literals in the body of a clause usually influences the efficiency of the program to a large degree. Similarly, the order of clauses in a program often determines whether a program will give an answer at all. Therefore, in this chapter we will take a closer look at Prolog’s inference engine and its built-in features (some of which are non-declarative). Also, we will discuss some common programming techniques.

就讓我們舉個典型例子 ── 『台北捷運網』一小部份 ──,講講『陳述地』  declaratively 以及『程序地』procedurally 的『意義』不同,如何展現在程式『思考』和『寫作』上。

在這個例子裡,我們將以『忠孝新生』站為中心,含括了二十五個捷運站,隨意不依次序給定站名如下︰

松江南京大安森林公園、善導寺、南京復興、忠孝復興  
台北小巨蛋、小南門、忠孝新生、中山、台北車站  
龍山寺、忠孝敦化、西門、雙連、中山國中  
信義安和、中正紀念堂、古亭、大安、行天宮  
東門、台大醫院、北門、科技大樓 、世貿台北101

【例子捷運圖】

台北捷運

假使我們定義

連接( □, ○) 代表 □ 捷運站,僅經過 □ 站,直通 ○ 捷運站,方向是 □ → ○。如此我們可以把這部份『捷運網』,表示為︰

pi@raspberrypi ~ python3 Python 3.2.3 (default, Mar  1 2013, 11:53:50)  [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.  # 定義站站間『連接』事實 >>> from pyDatalog import pyDatalog >>> pyDatalog.create_terms('連接, 鄰近, 能達, 所有路徑') >>> +連接('台北車站', '中山') >>> +連接('台北車站', '善導寺') >>> +連接('台北車站', '西門') >>> +連接('台北車站', '台大醫院') >>> +連接('西門', '龍山寺') >>> +連接('西門', '小南門') >>> +連接('西門', '北門') >>> +連接('中山', '北門') >>> +連接('中山', '雙連') >>> +連接('中山', '松江南京') >>> +連接('中正紀念堂', '小南門') >>> +連接('中正紀念堂', '台大醫院') >>> +連接('中正紀念堂', '東門') >>> +連接('中正紀念堂', '古亭') >>> +連接('東門', '古亭') >>> +連接('東門', '忠孝新生') >>> +連接('東門', '大安森林公園') >>> +連接('善導寺', '忠孝新生') >>> +連接('松江南京', '忠孝新生') >>> +連接('忠孝復興', '忠孝新生') >>> +連接('松江南京', '行天宮') >>> +連接('松江南京', '南京復興') >>> +連接('中山國中', '南京復興') >>> +連接('台北小巨蛋', '南京復興') >>> +連接('忠孝復興', '南京復興') >>> +連接('忠孝復興', '忠孝敦化') >>> +連接('忠孝復興', '大安') >>> +連接('大安森林公園', '大安') >>> +連接('科技大樓', '大安') >>> +連接('信義安和', '大安') >>> +連接('信義安和', '世貿台北101') >>>  </pre> 此處『連接』次序是『隨興』輸入的,一因『網路圖』沒有個經典次序,次因『事實』『陳述』不會因次序而改變,再 因『程序』上『pyDatalog』對此『事實』『次序』也並不要求。由於『連接』之次序有『起止』方向性,上面的陳述並不能代表那個『捷運網』,這可以 從下面程式片段得知。【<span style="color: #808080;">※ 在 pyDatalog 中,沒有變元的『查詢』 ask or query ,以輸出『set([()]) 』表示一個存在的事實,以輸出『None』表達所查詢的不是個事實。</span>】 <pre class="lang:sh decode:true"># 單向性 >>> pyDatalog.ask("連接('信義安和', '世貿台北101')") == set([()])  True >>> pyDatalog.ask("連接('世貿台北101', '信義安和')") == set([()])  False >>> pyDatalog.ask("連接('世貿台北101', '信義安和')") == None True >>>  </pre> 所以我們必須給定『連接( □, ○)』是具有『雙向性』的,也就是  <span style="color: #ff9900;">連接<span class="crayon-sy">(</span><span class="crayon-i">X</span>站名<span class="crayon-sy">,</span><span class="crayon-i">Y</span>站名<span class="crayon-sy">)</span><span class="crayon-o"><=</span>連接<span class="crayon-sy">(</span><span class="crayon-i">Y</span>站名<span class="crayon-sy">,</span><span class="crayon-i">X</span>站名<span class="crayon-sy">)</span></span>  ,這樣的『規則』 Rule 。由於 pyDatalog 的『語詞』 Term 使用前都必須『宣告』,而且『變元』必須『大寫開頭』,因此我們得用  <span style="color: #ff9900;">pyDatalog.create_terms('X站名, Y站名, Z站名, P路徑甲, P路徑乙')</span>  這樣的『陳述句』 Statement。【<span style="color: #808080;">※ 中文沒有大小寫,也許全部被當成了小寫,所以變元不得不以英文大寫起頭。</span>】  ─── 摘自《<a href="http://www.freesandal.org/?p=37256">勇闖新世界︰ 《 pyDatalog 》 導引《七》</a>》     <span style="color: #666699;">假使細思z_j = \sum_i w_{ji} \cdot x_i + b_j$ 表達式,或許自可發現『 S 神經元』網絡計算與『矩陣』數學密切關聯︰

Matrix (mathematics)

Definition

A matrix is a rectangular array of numbers or other mathematical objects for which operations such as addition and multiplication are defined.[6] Most commonly, a matrix over a field F is a rectangular array of scalars each of which is a member of F.[7][8] Most of this article focuses on real and complex matrices, that is, matrices whose elements are real numbers or complex numbers, respectively. More general types of entries are discussed below. For instance, this is a real matrix:

\mathbf{A} = \begin{bmatrix} -1.3 & 0.6 \\ 20.4 & 5.5 \\ 9.7 & -6.2 \end{bmatrix}.

The numbers, symbols or expressions in the matrix are called its entries or its elements. The horizontal and vertical lines of entries in a matrix are called rows and columns, respectively.

Size

The size of a matrix is defined by the number of rows and columns that it contains. A matrix with m rows and n columns is called an m × n matrix or m-by-n matrix, while m and n are called its dimensions. For example, the matrix A above is a 3 × 2 matrix.

Matrices which have a single row are called row vectors, and those which have a single column are called column vectors. A matrix which has the same number of rows and columns is called a square matrix. A matrix with an infinite number of rows or columns (or both) is called an infinite matrix. In some contexts, such as computer algebra programs, it is useful to consider a matrix with no rows or no columns, called an empty matrix.

Name Size Example Description
Row vector 1 × n \begin{bmatrix}3 & 7 & 2 \end{bmatrix} A matrix with one row, sometimes used to represent a vector
Column vector n × 1 \begin{bmatrix}4 \\ 1 \\ 8 \end{bmatrix} A matrix with one column, sometimes used to represent a vector
Square matrix n × n \begin{bmatrix} 9 & 13 & 5 \\ 1 & 11 & 7 \\ 2 & 6 & 3 \end{bmatrix} A matrix with the same number of rows and columns, sometimes used to represent a linear transformation from a vector space to itself, such as reflection, rotation, or shearing.

……

Linear equations

Matrices can be used to compactly write and work with multiple linear equations, that is, systems of linear equations. For example, if A is an m-by-n matrix, x designates a column vector (that is, n×1-matrix) of n variables x1, x2, ..., xn, and b is an m×1-column vector, then the matrix equation

Ax = b

is equivalent to the system of linear equations

A1,1x1 + A1,2x2 + ... + A1,nxn = b1
...
Am,1x1 + Am,2x2 + ... + Am,nxn = bm .[24]

……

Relationship to linear maps

Linear maps RnRm are equivalent to m-by-n matrices, as described above. More generally, any linear map f: VW between finite-dimensional vector spaces can be described by a matrix A = (aij), after choosing bases v1, ..., vn of V, and w1, ..., wm of W (so n is the dimension of V and m is the dimension of W), which is such that

f(\mathbf{v}_j) = \sum_{i=1}^m a_{i,j} \mathbf{w}_i\qquad\mbox{for }j=1,\ldots,n.

In other words, column j of A expresses the image of vj in terms of the basis vectors wi of W; thus this relation uniquely determines the entries of the matrix A. Note that the matrix depends on the choice of the bases: different choices of bases give rise to different, but equivalent matrices.[60] Many of the above concrete notions can be reinterpreted in this light, for example, the transpose matrix AT describes the transpose of the linear map given by A, with respect to the dual bases.[61]

These properties can be restated in a more natural way: the category of all matrices with entries in a field k with multiplication as composition is equivalent to the category of finite dimensional vector spaces and linear maps over this field.

More generally, the set of m×n matrices can be used to represent the R-linear maps between the free modules Rm and Rn for an arbitrary ring R with unity. When n = m composition of these maps is possible, and this gives rise to the matrix ring of n×n matrices representing the endomorphism ring of Rn.

───

 

何不趁此機會複習或學習一下的耶!!