W!o+ 的《小伶鼬工坊演義》︰神經網絡【Perceptron】一

閱讀 Michael Nielsen 所寫的『感知器』闡釋︰

Perceptrons

What is a neural network? To get started, I’ll explain a type of artificial neuron called a perceptron. Perceptrons were developed in the 1950s and 1960s by the scientist Frank Rosenblatt, inspired by earlier work by Warren McCulloch and Walter Pitts. Today, it’s more common to use other models of artificial neurons – in this book, and in much modern work on neural networks, the main neuron model used is one called the sigmoid neuron. We’ll get to sigmoid neurons shortly. But to understand why sigmoid neurons are defined the way they are, it’s worth taking the time to first understand perceptrons.

So how do perceptrons work? A perceptron takes several binary inputs, x_1, x_2, \cdots, and produces a single binary output:

In the example shown the perceptron has three inputs, x_1, x_2, x_3. In general it could have more or fewer inputs. Rosenblatt proposed a simple rule to compute the output. He introduced weights,w_1,w_2,\cdots, real numbers expressing the importance of the respective inputs to the output. The neuron’s output, 0 or 1, is determined by whether the weighted sum \sum_j w_j x_j is less than or greater than some threshold value. Just like the weights, the threshold is a real number which is a parameter of the neuron. To put it in more precise algebraic terms:

output = \begin{cases}0 & \text{if } \sum_j w_j x_j \leq \ threshold\\1 & \text{if} \ \sum_j w_j x_j > \ threshold\end{cases}

 

That’s all there is to how a perceptron works!

……

 

當真單刀直入又簡潔直接!緊接說明如何用『感知器』建構邏輯閘『NAND』︰

I’ve described perceptrons as a method for weighing evidence to make decisions. Another way perceptrons can be used is to compute the elementary logical functions we usually think of as underlying computation, functions such as AND, OR, and NAND. For example, suppose we have a perceptron with two inputs, each with weight 2, and an overall bias of 3. Here’s our perceptron:

Then we see that input 00 produces output 1, since (-2)*0+(-2)*0+3 = 3 is positive. Here, I’ve introduced the symbol to make the multiplications explicit. Similar calculations show that the inputs 01 and 10 produce output 1. But the input 11 produces output 0, since (-2)*1+(-2)*1+3 = -1 is negative. And so our perceptron implements a NAND gate!

……

 

於是乎通熟邏輯者可以知道『感知器』的『網絡』能夠運算任何『邏輯表達式』!!

250px-Hasse_diagram_of_powerset_of_3.svg

500px-Vennandornot.svg

Venn diagram of A ↑ B

120px-NAND_ANSI_Labelled.svg

220px-7400

100px-CMOS_NAND_Layout.svg

220px-NXP-74AHC00D-HD

由於電流的『有無』和電壓的『高低』,與邏輯上的『真假』二態相似,所以很自然的布林代數也就進入『邏輯電路』的『設計』領域之中了。今天在『電子工程』領域裡的專業化了的布林代數叫做『邏輯代數』,而在『計算機科學』領域中之專門化的布林代數又稱作『布爾邏輯』。

一八八零年皮爾士已經發現『NAND』── Not AND ── 和『NOR』── Not OR ── 的邏輯『功能完備性』,還特別用了一個希臘語詞『ampheck』ἀμφήκης 雙刃,來表達它們都是『自足算子』;也就是說所有的邏輯表達式,都可以只用『NAND』或『NOR』構成。但是由於他從未發表過,於是三十三年後,一九一三年美國Henry M. Sheffer  於一篇提交給『Transactions of the American Mathematical Society 』的論文中,使用『豎線|』Strock 表示 NAND,同時他也論證了這個公理系統的完備性;因此現今 NAND 被稱作『Sheffer豎線 』,而 NOR 也叫做『皮爾士箭頭』。不知又是誰先想到這給邏輯電路的『工程實作』帶來了巨大的『經濟性』和『簡易性』??

未知何年何月,有一

孤虛者言︰
物有無者,非真假也。苟日新,日日新,又日新。真假者,物之論也。論也者,當或不當而已矣。故世有孤虛者,言有孤虛論。孤虛何謂也?甲乙孤虛,言不得全真也,索其孤其虛而已。天地孤虛,去其上下也,善惡孤虛,何得善惡並真乎?是故孤虛論全矣!其法曰︰物物孤虛,言物之非也;孤虛之孤虛,此孤虛 之非也。使甲與並,此甲乙辜虛之非也,強使之或,乃非甲非乙之孤虛也。若云由此及彼,雖言之鑿鑿,若非彼與此之孤虛,无能以斷疑是也!!

─── 摘自《布林代數

 

若是一個了解 λ 運算者,是否會將之描述成『感知器泛函』呢?

美國數學家阿隆佐‧邱奇 Alonzo Church 對『符號邏輯』symbolic logic 的熱情與對基礎算術『形式系統』的研究持續一生。當他發現這類系統容易受到『羅素悖論』影響時,轉將所設想的『λ 運算』lambda calculus 單獨用於研究『可計算性』問題, 同樣否定的回答了『判定性問題』 Decision-problem ──

一九零零年德國大數學家大衛‧希爾伯特 David Hilbert 在巴黎的國際數學家大會上作了場名為《數學問題》的演講,提出了二十三道待解決之最重要的數學問題,這就是著名的希爾伯特之二十三個問題的由來。這裡的『判定性問題』 就是二十三個問題的第二題『算術公理之相容性』,一九三零年已為庫爾特‧哥德爾否證 ──。美國 UCLA 大學 Enderton, Herbert B. 教授寫了一篇名為
INTRODUCTION
Alonzo Church: Life and Work》的 Alonzo Church 小傳︰

In 1936 a pair of papers by Church changed the course of logic. An Unsolvable Problem of Elementary Number Theory presents a definition and a theorem:
“The purpose of the present paper is to propose a definition of effective calculability which is thought to correspond satisfactorily to the somewhat vague intuitive notion . . . , and to show, by means of an example, that not every problem of this class is solvable.” The “definition” now goes by the name Church’s Thesis: “We now define the notion . . . of an effectively calculable function of positive integers by identifying it with the notion of a recursive function of positive integers (or of a λ-definable function of positive integers).” (The name, “Church’s Thesis,” was introduced by Kleene.)
The theorem in the paper is that there is a set that can be defined in the language of elementary number theory (viz. the set of G ̈odel numbers of formulas of the λ-calculus having a normal form) that is not recursive—although it is recursively enumerable. Thus truth in elementary number theory is an effectively unsolvable problem.

A sentence at the end of the paper adds the consequence that if the system of Principia Mathematica is ω-consistent, then its decision problem is unsolvable. It also follows that the system (if  ω-consistent) is incomplete, but of course G ̈odel had shown that in 1931.

220px-Euclid_flowchart_1

λ演算是一套用來研究『函式』Function 的『抽象』Abstraction 、函式的『應用』Apply 、『變元』Variable 的『替換』Substitution 以及『函式』之『化約』reduction 的形式語言。λ演算對於泛函式程式語言的興起有著巨大的推動力,比方說人工智慧領域著名的 Lisp 語言,以及後來的 ML 語言和 Haskell 語言。更令人驚訝的是它自身就是一個『世界上最小的通用性程式語言』。由於『函式』與『變元』兩者是任何人不管想用哪種『□□程式語言』來寫『演算法Algorithm 都需要清楚理解的概念。就讓我們踏上前人走過的足跡,回顧旅途上周遭景緻,說不定會有意外的收穫!!

─── 摘自《λ 運算︰淵源介紹

 

終究『感知器』之『網絡』也不過是另一種『計算機』而已??

The computational universality of perceptrons is simultaneously reassuring and disappointing. It’s reassuring because it tells us that networks of perceptrons can be as powerful as any other computing device. But it’s also disappointing, because it makes it seem as though perceptrons are merely a new type of NAND gate. That’s hardly big news!

However, the situation is better than this view suggests. It turns out that we can devise learning algorithms which can automatically tune the weights and biases of a network of artificial neurons. This tuning happens in response to external stimuli, without direct intervention by a programmer. These learning algorithms enable us to use artificial neurons in a way which is radically different to conventional logic gates. Instead of explicitly laying out a circuit of NAND and other gates, our neural networks can simply learn to solve problems, sometimes problems where it would be extremely difficult to directly design a conventional circuit.

───

 

當然是!但它『自動改寫』!!

最後我們介紹一下『何謂 Thue 之改寫系統』?結束這個《系列》的第一篇。阿克塞爾‧圖厄【挪威語 Axel Thue】一位數學家,以研究丟番圖用『有理數』逼近『實數』問題以及開拓『組合數學』之貢獻而聞名。他於一九一四發表了『詞之群論問題』Word problem for group 啟始了一個今天稱之為『字串改寫系統』SRS String Rewriting System 的先河,如從現今的研究和發現來看,它與圖靈機的『停機問題』密切相關。上個千禧年之時,John Colagioia 用『Semi-Thue System』寫了一個『奧秘的 esoteric 程式語言 Thue ,作者宣稱︰

Thue represents one of the simplest possible ways to construe 『constraint-based』基於約束 programming. It is to the constraint-based 『paradigm』典範 what languages like『 OISC 』── 單指令集電腦 One instruction set computer ── are to the imperative paradigm; in other words, it’s a 『tar pit』焦油坑.

─── 摘自《 Thue 之改寫系統《一》

 

彷彿本源自仙女之『圖靈機』也!!??

看來仙女早已遠離,那就說說存在的這台『圖靈機』吧!根據『 Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation (1st ed.)』所定義︰

單磁帶』one-tape 圖靈機是一個有序七元組 M= \langle Q, \Gamma, b, \Sigma, \delta, q_0, F \rangle ,此處

Q 是一個有限非空的『狀態』 state 集合

\Gamma 是一個有限非空的『磁帶上字母符號』集合

b \in \Gamma 是一個『空白符號』,唯一允許在任意計算步驟中無限次出現在磁帶上的符號

\Sigma\subseteq\Gamma\setminus\{b\} 是不包含空白符號的『輸入符號』集合

q_0 \in Q 是『初始狀態

F \subseteq Q 是『最終狀態』或稱作『接受狀態』,一般可能有 q_{accept}, q_{reject},q_{HALT}

\delta: Q \setminus F \times \Gamma \rightarrow Q \times \Gamma \times \{L,R\} 是稱作『轉移函式』transition function,其中 L,R 代表『讀寫頭』之向『左,右』移動,還有的增加了『不移動』no shift 的擴張

220px-Maquina

300px-Turing_machine_2b

500px-State_diagram_3_state_busy_beaver_4_

220px-Lego_Turing_Machine

220px-Hopcroft-ullman-old

圖靈機 M 將以下面方式運行:

開 機的時候首先將輸入符號字串 \omega=\omega_0\omega_1\ldots\omega_{n-1} \in \Sigma^* 依序的從左到右填寫在磁帶之第 0, 1, \ldots , n-1 編號的格子上, 將所有其它格子保持空白 ── 即填以空白符 b ──。 然後 M 的讀寫頭指向第 0 號格子,此時 M 處於狀態 q_0。 機器開始執行指令,按照轉移函式 \delta 所描述的規則進行逐步計算。 比方如果當前機器的狀態是 q,讀寫頭所指的格子中的符號是 x, 假使 \delta(q,x) = (q', x', L), 那麼機器下一步將進入新的狀態 q', 且將讀寫頭所指的格子中的符號改寫為 x', 然後把讀寫頭向左移動一個格子。 設使在某一時刻,讀寫頭所指的是第 0 號格子, 然而根據轉移函式它的下一步又將繼續向左移,此時它停在原處不動,也就是說,讀寫頭永遠不會移出了磁帶的左方界限。 設若在某個時刻 M 根據轉移函式進入了某種最終狀態 q_{final}, 則它立刻停機並留下磁帶上的結果字串。由於轉移函式 \delta 是一個部分函式,換句話說對於某些 q,x, \ \delta(q,x) 可能沒有可用的轉移定義, 如果在執行中遇到這種情況,機器依設計約定也將立即 q_{HALT} 停機。

─── 摘自《紙張、鉛筆和橡皮擦

 

所以才瞎子摸象說法林立的耶??!!