W!o+ 的《小伶鼬工坊演義》︰神經網絡與深度學習【引言】

雖說我們不能從維基百科詞條

人工神經網絡

人工神經網絡(artificial neural network,縮寫ANN),簡稱神經網絡(neural network,縮寫NN)或類神經網絡,是一種模仿生物神經網絡(動物的中樞神經系統,特別是大腦)的結構和功能的數學模型計算模型。神經網絡由大量的人工神經元聯結進行計算。大多數情況下人工神經網絡能在外界信息的基礎上改變內部結構,是一種自適應系統。現代神經網絡是一種非線性統計性數據建模工具,常用來對輸入和輸出間複雜的關係進行建模,或用來探索數據的模式。

神經網絡是一種運算模型[1],由大量的節點(或稱「神經元」, 或「單元」)和之間相互聯接構成。每個節點代表一種特定的輸出函數,稱為激勵函數(activation function)。每兩個節點間的連接都代表一個對於通過該連接信號的加權值,稱之為權重(weight),這相當於人工神經網路的記憶。網絡的輸出則 依網絡的連接方式,權重值和激勵函數的不同而不同。而網絡自身通常都是對自然界某種算法或者函數的逼近,也可能是對一種邏輯策略的表達。

例如,用於手寫識別的一個神經網絡是被可由一個輸入圖像的像素被激活的一組輸入神經元所定義的。在通過函數(由網絡的設計者確定)進行加權和變換之後,這些神經元被激活然後被傳遞到其他神經元。重複這一過程,直到最後一個輸出神經元被激活。這樣決定了被讀取的字。

它的構築理念是受到生物(人或其他動物)神經網絡功能的運作啟發而產生的。人工神經網絡通常是通過一個基於數學統計學類型的學習方法 (Learning Method)得以優化,所以人工神經網絡也是數學統計學方法的一種實際應用,通過統計學的標準數學方法我們能夠得到大量的可以用函數來表達的局部結構空 間,另一方面在人工智慧學的人工感知領域,我們通過數學統計學的應用可以來做人工感知方面的決定問題(也就是說通過統計學的方法,人工神經網絡能夠類似人 一樣具有簡單的決定能力和簡單的判斷能力),這種方法比起正式的邏輯學推理演算更具有優勢。

───

 

真實知道『神經網絡』到底是什麼 □ ○ ?彷彿不外乎熟悉『名詞』而已!

但是《老子》為什麼卻又講

道可道,非常道。名可名,非常名。無名天地之始,有名萬物之母 。故常無欲,以觀其妙;常有欲,以觀其徼。此兩者同出而異名,同謂之玄,玄之又玄,眾妙之門。

 

因此所謂

深度學習

深度學習英語:deep learning)是機器學習的一個分支,它基於試圖使用包含複雜結構或由多重非線性變換構成的多個處理層對資料進行高層抽象的一系列演算法[1][2][3][4][5]

深度學習是機器學習表征學習方法的一類。一個觀測值(例如一幅圖像)可以使用多種方式來表示,如每個像素強度值的向量,或者更抽象地表示成一系列邊、特定形狀的區域。而使用某些特定的表示方法更加容易地從例項中學習任務(例如,人臉識別或面部表情識別[6])。深度學習的好處之一是將用非監督式半監督式特徵學習和分層特徵提取的高效演算法來替代手工取得特徵[7]

表征學習的目標是尋求更好的表示方法並建立更好的模型來從大規模未標記資料中學習這些表示方法。一些表達方式的靈感來自於神經科學的進步,並鬆散地建立在神經系統中的資訊處理和通信模式的理解基礎上,如神經編碼,試圖定義刺激和神經元的反應之間的關係以及大腦中的神經元的電活動之間的關係。[8]

至今已有多種深度學習框架,如深度神經網路卷積神經網路深度信念網路遞迴神經網路已被應用於電腦視覺語音識別自然語言處理、音訊識別與生物資訊學等領域並取得了極好的效果。

另外,深度學習已成為一個時髦術語,或者說是神經網路的品牌重塑。[9][10]

───

 

 亦藏於『眾妙之門』之玄境耶!?

若問『廣泛學習』可否得到利益?

Artificial neural network

In machine learning and cognitive science, artificial neural networks (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown. Artificial neural networks are generally presented as systems of interconnected “neurons” which exchange messages between each other. The connections have numeric weights that can be tuned based on experience, making neural nets adaptive to inputs and capable of learning.

For example, a neural network for handwriting recognition is defined by a set of input neurons which may be activated by the pixels of an input image. After being weighted and transformed by a function (determined by the network’s designer), the activations of these neurons are then passed on to other neurons. This process is repeated until finally, an output neuron is activated. This determines which character was read.

Like other machine learning methods – systems that learn from data – neural networks have been used to solve a wide variety of tasks that are hard to solve using ordinary rule-based programming, including computer vision and speech recognition.

296px-Colored_neural_network.svg

An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.

……

 

探索歷史

History

Warren McCulloch and Walter Pitts[2] (1943) created a computational model for neural networks based on mathematics and algorithms called threshold logic. This model paved the way for neural network research to split into two distinct approaches. One approach focused on biological processes in the brain and the other focused on the application of neural networks to artificial intelligence.

Hebbian learning

In the late 1940s psychologist Donald Hebb[3] created a hypothesis of learning based on the mechanism of neural plasticity that is now known as Hebbian learning. Hebbian learning is considered to be a ‘typical’ unsupervised learning rule and its later variants were early models for long term potentiation. Researchers started applying these ideas to computational models in 1948 with Turing’s B-type machines.

Farley and Wesley A. Clark[4] (1954) first used computational machines, then called “calculators,” to simulate a Hebbian network at MIT. Other neural network computational machines were created by Rochester, Holland, Habit, and Duda[5] (1956).

Frank Rosenblatt[6] (1958) created the perceptron, an algorithm for pattern recognition based on a two-layer computer learning network using simple addition and subtraction. With mathematical notation, Rosenblatt also described circuitry not in the basic perceptron, such as the exclusive-or circuit, a circuit which could not be processed by neural networks until after the backpropagation algorithm was created by Paul Werbos[7] (1975).

Neural network research stagnated after the publication of machine learning research by Marvin Minsky and Seymour Papert[8] (1969), who discovered two key issues with the computational machines that processed neural networks. The first was that basic perceptrons were incapable of processing the exclusive-or circuit. The second significant issue was that computers didn’t have enough processing power to effectively handle the long run time required by large neural networks. Neural network research slowed until computers achieved greater processing power.

Backpropagation and Resurgence

A key advance that came later was the backpropagation algorithm which effectively solved the exclusive-or problem, and more generally the problem of quickly training multi-layer neural networks (Werbos 1975).[7]

In the mid-1980s, parallel distributed processing became popular under the name connectionism. The textbook by David E. Rumelhart and James McClelland[9] (1986) provided a full exposition of the use of connectionism in computers to simulate neural processes.

Neural networks, as used in artificial intelligence, have traditionally been viewed as simplified models of neural processing in the brain, even though the relation between this model and the biological architecture of the brain is debated; it’s not clear to what degree artificial neural networks mirror brain function.[10]

Support vector machines and other, much simpler methods such as linear classifiers gradually overtook neural networks in machine learning popularity. But the advent of deep learning in the late 2000s sparked renewed interest in neural nets.

Improvements since 2006

Computational devices have been created in CMOS, for both biophysical simulation and neuromorphic computing. More recent efforts show promise for creating nanodevices[11] for very large scale principal components analyses and convolution. If successful, these efforts could usher in a new era of neural computing[12] that is a step beyond digital computing, because it depends on learning rather than programming and because it is fundamentally analog rather than digital even though the first instantiations may in fact be with CMOS digital devices.

Between 2009 and 2012, the recurrent neural networks and deep feedforward neural networks developed in the research group of Jürgen Schmidhuber at the Swiss AI Lab IDSIA have won eight international competitions in pattern recognition and machine learning.[13][14] For example, the bi-directional and multi-dimensional long short term memory (LSTM)[15][16][17][18] of Alex Graves et al. won three competitions in connected handwriting recognition at the 2009 International Conference on Document Analysis and Recognition (ICDAR), without any prior knowledge about the three different languages to be learned.

Fast GPU-based implementations of this approach by Dan Ciresan and colleagues at IDSIA have won several pattern recognition contests, including the IJCNN 2011 Traffic Sign Recognition Competition,[19][20] the ISBI 2012 Segmentation of Neuronal Structures in Electron Microscopy Stacks challenge,[21] and others. Their neural networks also were the first artificial pattern recognizers to achieve human-competitive or even superhuman performance[22] on important benchmarks such as traffic sign recognition (IJCNN 2012), or the MNIST handwritten digits problem of Yann LeCun at NYU.

Deep, highly nonlinear neural architectures similar to the 1980 neocognitron by Kunihiko Fukushima[23] and the “standard architecture of vision”,[24] inspired by the simple and complex cells identified by David H. Hubel and Torsten Wiesel in the primary visual cortex, can also be pre-trained by unsupervised methods[25][26] of Geoff Hinton‘s lab at University of Toronto.[27][28] A team from this lab won a 2012 contest sponsored by Merck to design software to help find molecules that might lead to new drugs.[29]

───

 

能否有所啟發!誠大哉問的也?!

故而只因『小巧完整』,宣說 Michael Nielsen 的

Neural Networks and Deep Learning 文本︰

Neural networks are one of the most beautiful programming paradigms ever invented. In the conventional approach to programming, we tell the computer what to do, breaking big problems up into many small, precisely defined tasks that the computer can easily perform. By contrast, in a neural network we don’t tell the computer how to solve our problem. Instead, it learns from observational data, figuring out its own solution to the problem at hand.

Automatically learning from data sounds promising. However, until 2006 we didn’t know how to train neural networks to surpass more traditional approaches, except for a few specialized problems. What changed in 2006 was the discovery of techniques for learning in so-called deep neural networks. These techniques are now known as deep learning. They’ve been developed further, and today deep neural networks and deep learning achieve outstanding performance on many important problems in computer vision, speech recognition, and natural language processing. They’re being deployed on a large scale by companies such as Google, Microsoft, and Facebook.

The purpose of this book is to help you master the core concepts of neural networks, including modern techniques for deep learning. After working through the book you will have written code that uses neural networks and deep learning to solve complex pattern recognition problems. And you will have a foundation to use neural networks and deep learning to attack problems of your own devising.

……

It’s rare for a book to aim to be both principle-oriented and hands-on. But I believe you’ll learn best if we build out the fundamental ideas of neural networks. We’ll develop living code, not just abstract theory, code which you can explore and extend. This way you’ll understand the fundamentals, both in theory and practice, and be well set to add further to your knowledge.

 

且留大部頭之『未出版』大作

Deep Learning

An MIT Press book in preparation

Ian Goodfellow, Yoshua Bengio and Aaron Courville

 

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The book will be available for sale soon, and will remain available online for free.

Citing the book in preparation

To cite this book in preparation, please use this bibtex entry:

 

@unpublished{Goodfellow-et-al-2016-Book,
    title={Deep Learning},
    author={Ian Goodfellow, Yoshua Bengio, and Aaron Courville},
    note={Book in preparation for MIT Press},
    url={http://www.deeplearningbook.org},
    year={2016}
}

───

 

于有興趣者自享的哩??!!

 

 

 

 

 

 

 

 

 

 

 

 

 

W!o+ 的《小伶鼬工坊演義》︰巴蛇食象

俗話說︰人心不足蛇吞象。用以表達過份貪婪。據維基百科詞條講 ,語源出自《三海經》之『巴蛇食象』︰

山海經校注·海內南經

  (山海經第十·山海經海經新釋卷五)

15、巴蛇食象,三歲而出其骨,君子服之,無心腹之疾①。其為蛇青黃赤黑②。一曰黑蛇青首③,在犀牛西。

① 郭璞云:“今南方(虫丹)蛇(藏經本作蟒蛇——珂)吞鹿,鹿已爛,自絞於樹腹中,骨皆穿鱗甲間出,此其類也。楚詞曰:‘有蛇吞象,厥大何如?’說者云長千尋。”郝懿行云:“今楚詞天問作‘一蛇吞象’,與郭所引異。王逸注引此經作‘靈蛇吞象’,並與今本異也。”珂案:淮南子本經篇云:“羿斷修蛇於洞庭。”路史後紀十以“修蛇”作“長它 ”,羅苹注云:“長它即所謂巴蛇,在江岳間。其墓今巴陵之巴丘,在州治側。江源記(即江記,六朝宋庾仲雍撰 ——珂)云:‘羿屠巴蛇於洞庭,其骨若陵,曰巴陵也。’”岳陽風土記(宋范致明撰)亦云:“今巴蛇□在州院廳側,巍然而高,草木叢翳。兼有巴蛇廟,在岳陽 門內 。”又云:“象骨山。山海經云:‘巴蛇吞象。’暴其骨於此。山旁湖謂之象骨港。”是均從此經及淮南子附會而生出之神話。然而既有冢有廟,有山有港,言之確鑿,則知傳播於民間亦已久矣。

② 珂案:言其文采斑爛也。

③ 珂案:海內經云:“有巴遂山,澠水出焉。又有朱卷之國。有黑蛇,青首,食象。”即此。巴,小篆作□,說文十四云:“蟲也;或曰 :食象蛇。象 形。”則所象者,物在蛇腹彭亨之形。山海經多稱大蛇 ,如北山經云:“大咸之山,有蛇名曰長蛇,其毛如彘毫,其音如鼓柝。”北次三經云:“錞於毋逢之山,是 有大蛇,赤首白身,其音如牛,見則其邑大旱。”是可以“吞象”矣。水經注葉榆河云:“山多大蛇 ,名曰髯蛇,長十丈,圍七八尺,常在樹上伺鹿獸,鹿獸過,便低頭繞之。有頃鹿死,先濡令濕訖,便吞,頭角骨皆鑽皮出。山夷始見蛇不動時,便以大竹籤籤蛇頭至尾,殺而食之,以為珍異。”即郭注所謂(虫丹)蛇也。

───

 

然而從《山海經》的說法『君子服之,無心腹之疾。』來看,一點也沒有貪婪的意思吧!郭璞認為『巴蛇』是『蟒蛇』一類。無怪乎『TensorFlow』能在二十行內完成『手寫阿拉伯數字』辨識程式︰

MNIST For ML Beginners

This tutorial is intended for readers who are new to both machine learning and TensorFlow. If you already know what MNIST is, and what softmax (multinomial logistic) regression is, you might prefer this faster paced tutorial. Be sure to install TensorFlow before starting either tutorial.

When one learns how to program, there’s a tradition that the first thing you do is print “Hello World.” Just like programming has Hello World, machine learning has MNIST.

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:

It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.

In this tutorial, we’re going to train a model to look at images and predict what digits they are. Our goal isn’t to train a really elaborate model that achieves state-of-the-art performance — although we’ll give you code to do that later! — but rather to dip a toe into using TensorFlow. As such, we’re going to start with a very simple model, called a Softmax Regression.

The actual code for this tutorial is very short, and all the interesting stuff happens in just three lines. However, it is very important to understand the ideas behind it: both how TensorFlow works and the core machine learning concepts. Because of this, we are going to very carefully work through the code.

……

# The MNIST Data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Implementing the Regression
import tensorflow as tf

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

# Training
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluating Our Model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))



 

This should be about 91%.

Is that good? Well, not really. In fact, it’s pretty bad. This is because we’re using a very simple model. With some small changes, we can get to 97%. The best models can get to over 99.7% accuracy! (For more information, have a look at this list of results.)

What matters is that we learned from this model. Still, if you’re feeling a bit down about these results, check out the next tutorial where we do a lot better, and learn how to build more sophisticated models using TensorFlow!

───

 

考之以『THE MNIST DATABASE』上的『錯誤率』比較表︰

CLASSIFIER PREPROCESSING TEST ERROR RATE (%) Reference
Linear Classifiers
linear classifier (1-layer NN) none 12.0 LeCun et al. 1998
linear classifier (1-layer NN) deskewing 8.4 LeCun et al. 1998
pairwise linear classifier deskewing 7.6 LeCun et al. 1998
K-Nearest Neighbors
K-nearest-neighbors, Euclidean (L2) none 5.0 LeCun et al. 1998
K-nearest-neighbors, Euclidean (L2) none 3.09 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 none 2.83 Kenneth Wilder, U. Chicago
K-nearest-neighbors, Euclidean (L2) deskewing 2.4 LeCun et al. 1998
K-nearest-neighbors, Euclidean (L2) deskewing, noise removal, blurring 1.80 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 deskewing, noise removal, blurring 1.73 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 deskewing, noise removal, blurring, 1 pixel shift 1.33 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 deskewing, noise removal, blurring, 2 pixel shift 1.22 Kenneth Wilder, U. Chicago
K-NN with non-linear deformation (IDM) shiftable edges 0.54 Keysers et al. IEEE PAMI 2007
K-NN with non-linear deformation (P2DHMDM) shiftable edges 0.52 Keysers et al. IEEE PAMI 2007
K-NN, Tangent Distance subsampling to 16×16 pixels 1.1 LeCun et al. 1998
K-NN, shape context matching shape context feature extraction 0.63 Belongie et al. IEEE PAMI 2002
Boosted Stumps
boosted stumps none 7.7 Kegl et al., ICML 2009
products of boosted stumps (3 terms) none 1.26 Kegl et al., ICML 2009
boosted trees (17 leaves) none 1.53 Kegl et al., ICML 2009
stumps on Haar features Haar features 1.02 Kegl et al., ICML 2009
product of stumps on Haar f. Haar features 0.87 Kegl et al., ICML 2009
Non-Linear Classifiers
40 PCA + quadratic classifier none 3.3 LeCun et al. 1998
1000 RBF + linear classifier none 3.6 LeCun et al. 1998
SVMs
SVM, Gaussian Kernel none 1.4
SVM deg 4 polynomial deskewing 1.1 LeCun et al. 1998
Reduced Set SVM deg 5 polynomial deskewing 1.0 LeCun et al. 1998
Virtual SVM deg-9 poly [distortions] none 0.8 LeCun et al. 1998
Virtual SVM, deg-9 poly, 1-pixel jittered none 0.68 DeCoste and Scholkopf, MLJ 2002
Virtual SVM, deg-9 poly, 1-pixel jittered deskewing 0.68 DeCoste and Scholkopf, MLJ 2002
Virtual SVM, deg-9 poly, 2-pixel jittered deskewing 0.56 DeCoste and Scholkopf, MLJ 2002
Neural Nets
2-layer NN, 300 hidden units, mean square error none 4.7 LeCun et al. 1998
2-layer NN, 300 HU, MSE, [distortions] none 3.6 LeCun et al. 1998
2-layer NN, 300 HU deskewing 1.6 LeCun et al. 1998
2-layer NN, 1000 hidden units none 4.5 LeCun et al. 1998
2-layer NN, 1000 HU, [distortions] none 3.8 LeCun et al. 1998
3-layer NN, 300+100 hidden units none 3.05 LeCun et al. 1998
3-layer NN, 300+100 HU [distortions] none 2.5 LeCun et al. 1998
3-layer NN, 500+150 hidden units none 2.95 LeCun et al. 1998
3-layer NN, 500+150 HU [distortions] none 2.45 LeCun et al. 1998
3-layer NN, 500+300 HU, softmax, cross entropy, weight decay none 1.53 Hinton, unpublished, 2005
2-layer NN, 800 HU, Cross-Entropy Loss none 1.6 Simard et al., ICDAR 2003
2-layer NN, 800 HU, cross-entropy [affine distortions] none 1.1 Simard et al., ICDAR 2003
2-layer NN, 800 HU, MSE [elastic distortions] none 0.9 Simard et al., ICDAR 2003
2-layer NN, 800 HU, cross-entropy [elastic distortions] none 0.7 Simard et al., ICDAR 2003
NN, 784-500-500-2000-30 + nearest neighbor, RBM + NCA training [no distortions] none 1.0 Salakhutdinov and Hinton, AI-Stats 2007
6-layer NN 784-2500-2000-1500-1000-500-10 (on GPU) [elastic distortions] none 0.35 Ciresan et al. Neural Computation 10, 2010 and arXiv 1003.0358, 2010
committee of 25 NN 784-800-10 [elastic distortions] width normalization, deslanting 0.39 Meier et al. ICDAR 2011
deep convex net, unsup pre-training [no distortions] none 0.83 Deng et al. Interspeech 2010
Convolutional nets
Convolutional net LeNet-1 subsampling to 16×16 pixels 1.7 LeCun et al. 1998
Convolutional net LeNet-4 none 1.1 LeCun et al. 1998
Convolutional net LeNet-4 with K-NN instead of last layer none 1.1 LeCun et al. 1998
Convolutional net LeNet-4 with local learning instead of last layer none 1.1 LeCun et al. 1998
Convolutional net LeNet-5, [no distortions] none 0.95 LeCun et al. 1998
Convolutional net LeNet-5, [huge distortions] none 0.85 LeCun et al. 1998
Convolutional net LeNet-5, [distortions] none 0.8 LeCun et al. 1998
Convolutional net Boosted LeNet-4, [distortions] none 0.7 LeCun et al. 1998
Trainable feature extractor + SVMs [no distortions] none 0.83 Lauer et al., Pattern Recognition 40-6, 2007
Trainable feature extractor + SVMs [elastic distortions] none 0.56 Lauer et al., Pattern Recognition 40-6, 2007
Trainable feature extractor + SVMs [affine distortions] none 0.54 Lauer et al., Pattern Recognition 40-6, 2007
unsupervised sparse features + SVM, [no distortions] none 0.59 Labusch et al., IEEE TNN 2008
Convolutional net, cross-entropy [affine distortions] none 0.6 Simard et al., ICDAR 2003
Convolutional net, cross-entropy [elastic distortions] none 0.4 Simard et al., ICDAR 2003
large conv. net, random features [no distortions] none 0.89 Ranzato et al., CVPR 2007
large conv. net, unsup features [no distortions] none 0.62 Ranzato et al., CVPR 2007
large conv. net, unsup pretraining [no distortions] none 0.60 Ranzato et al., NIPS 2006
large conv. net, unsup pretraining [elastic distortions] none 0.39 Ranzato et al., NIPS 2006
large conv. net, unsup pretraining [no distortions] none 0.53 Jarrett et al., ICCV 2009
large/deep conv. net, 1-20-40-60-80-100-120-120-10 [elastic distortions] none 0.35 Ciresan et al. IJCAI 2011
committee of 7 conv. net, 1-20-P-40-P-150-10 [elastic distortions] width normalization 0.27 +-0.02 Ciresan et al. ICDAR 2011
committee of 35 conv. net, 1-20-P-40-P-150-10 [elastic distortions] width normalization 0.23 Ciresan et al. CVPR 2012

 

 

『正確率』有百分之九十一以上,果真算不得什麼好?但並非因為如此,所以我們不多介紹這個程式庫。實在是它連在樹莓派 3 上都跑的太慢了!況且要了解『神經網絡』,最好能了解它的基本原理 ,為此作者接續將串講一本 Michael Nielsen 所寫的『公開書』

Neural Networks and Deep Learning

,期能引領有興趣者入此門徑。

 

 

 

 

 

 

 

 

 

 

 

 

 

彼得原理

一小段 MBAlib 文本節錄︰

MBAlib

彼得原理

彼得原理(The Peter Principle)

 

彼得原理的概述

管理學家勞倫斯·彼得(Laurence.J.Peter),1919年生於加拿大的溫哥華,1957年獲美國華盛頓州立大學學士學位,6年後又獲得該校教育哲學博士學位,他閱歷豐富,博學多才,著述頗豐,他的名字還被收入了《美國名人榜》、《美國科學界名人錄》和《國際名人傳記辭典》等辭書中。

彼得原理(The Peter Principle)正是彼得根據千百個有關組織中不能勝任的失敗實例的分析而歸納出來的。其具體內容是:“在一個等級制度中, 每個職工趨向於上升到他所不能勝任的地位”。彼得指出,每一個職工由於在原有職位上工作成績表現好(勝任),就將被提升到更高一級職位;其後,如果繼續勝任則將進一步被提升,直至到達他所不能勝任的職位。由此導出的推論是:“每一個職位最終都將被一個不能勝任其工作的職工所占據。層級組織的工作任務多半是由尚未達到勝任階層的員工完成的。”每一個職工最終都將達到彼得高地,在該處他的提升商數(PQ)為零。至於如何加速提升到這個高地,有兩種方法。其一, 是上面的“拉動”,即依靠裙帶關係和熟人等從上面拉;其二,是自我的“推動”,即自我訓練和進步等,而前者是被普遍採用的。

彼得認為,由於彼得原理的推出,使他“無意間”創設了一門新的科學——層級組織學(Hierarchiolgy)。該科學是解開所有階層制度之謎的鑰匙,因此也是瞭解整個文明結構的關鍵所在。凡是置身於商業、工業、政治、行政、軍事、宗教、教育各界的每個人都和層級組織息息相關,亦都受彼得原理的控制。當然,原理的假設條件是:時間足夠長,層級組織里有足夠的階層。彼得原理被認為是同帕金森定律有聯繫的。

 

彼得反轉原理

在對層級組織的研究中,彼得還分析歸納出彼德反轉原理:

一個員工的勝任與否,是由層級組織中的上司判定,而不是外界人士。如果上司已到達不勝任的階層,他或許會以制度的價值來 評判部屬。例如,他會註重員工是否遵守規範、儀式、表格之類的事;他將特別贊賞工作迅速、整潔有禮的員工。總之,類似上司是以輸入(input)評斷部 屬。於是對於那些把手段和目的的關係弄反了,方法重於目標、文書作業重於預定的目的、缺乏獨立判斷的自主權、只是服從而不作決定的職業性機械行為者而言, 他們會被組織認為是能勝任的工作者,因此有資格獲得晉升,一直升到必須作決策的職務時,組織才會發現他們已到達不勝任的階層。而以顧客客戶或受害者的觀點來看,他們本來就是不勝任的。

 

彼得原理的發展

諾斯古德·帕金森(C.N.Parkinson)是著名的社會理論家,他曾仔細觀察並有趣地描述層級組織中冗員累積的現象。他假設,組織中的高級主管採用分化和征服的策略,故意使組織效率降低,藉以提升自己的權勢,這種現象即帕金森所 說的“爬升金字塔”。彼得認為這種理論設計是有缺陷的,他給出的解釋員工累增現象的原因是層級組織的高級主管真誠追求效率(雖然徒勞無功)。正如彼得原理 顯示的,許多或大多數主管必已到達他們的不勝任階層。這些人無法改進現有的狀況,因為所有的員工已經竭盡全力了,於是為了再增進效率,他們只好雇用更多的 員工。員工的增加或許可以使效率暫時提升,但是這些新進的人員最後將因晉升過程而到達不勝任階層,於是唯一改善的方法就是再次增雇員工,再次獲得暫時的高效率,然後是另一次逐漸歸於無效率。這樣就使組織中的人數超過了工作的實際需要。

彼得原理首次公開發表於1960年9月美國聯邦出資的一次研習會上,聽眾是一群負責教育研究計劃、並剛獲晉升的項目主管,彼得認為他們多數人“只是拼命地想複製一些老掉牙了的統計習題”,於是引入彼得原理說明他們的困境。演說召來了敵意與嘲笑,但是彼得仍然決定以獨特的諷刺手法編寫彼得原理,儘管所有案例研究都 經過精確編纂,且引用的資料也都符合事實,最後定稿於1965年春完成,然後總計有16家之多的出版社無情地拒絕了該書的手稿。1966年,作者零星地在 報紙上發表了幾篇述論同一主題的文章,讀者的反應異常熱烈,引得各個出版社趨之若騖。正如彼得在自傳中提到的,人偶爾會在鏡中瞥見自己的身影而不能立即自 我辨認,於是在不自知前就加以嘲笑一番,這樣的片刻里正好可以使人進一步認識自己,“彼得原理”扮演的正是那樣一面鏡子。

───

 

能帶給人們多少啟示呢?

論語中雍也篇的一句話︰

子曰:知之者,不如好之者;好之者,不如樂之者。

可否使『學習』者免於墜入『不勝任』之地步??大概不能的吧!然而『樂在學習』者並不在意『得失』,故使得他與『追求卓越』者不同!!既已出『好惡』之外,或許是『潛龍』的乎?

初九曰:潛龍勿用。何謂也?
子曰: 龍德而隱者也。不易乎世,不成乎名﹔遯世而無悶,不見是而無悶﹔樂則行之,憂則違之﹔確乎其不可拔,乾龍也。

,如是『彼得原理』與其何用哉??

 

難到『為了娛樂』不正是 Chris Meyers 寫作

Python for Fun

Purpose of this Collection

This collection is a presentation of several small Python programs. They are aimed at intermediate programmers; people who have studied Python and are fairly comfortable with basic recursion and object oriented techniques. Each program is very short, never more than a couple of pages and accompanied with a write-up.

I have found Python to be an excellent language to express algorithms clearly. Some of the ideas here originated in other programs in other languages. But in most cases I developed code from scratch from just an outline of an idea. However Lisp in Python was almost a translation exercise from John McCarthy’s original Evalquote in Lisp.

From many years of programming these are some of my favorite programs. I hope you enjoy them as much as I do. I look forward to hearing from readers, especially folks with suggestions for improvements, ideas for new projects, or people who are doing similar things. You can email me at mailme.html

Many thanks to Paul Carduner and Jeff Elkner for their work on this page, especially for Paul’s graphic of Psyltherin (apologies to Harry Potter) and to the Twisted developement team for their Lore documentation generator to which all the other web pages in this collection have been recently adapted.

Chris Meyers

───

 

的原由嗎??也使得讀者得以『欣賞』,只用一百多行真可以完成『FORTH』編譯器的耶!!

#!/usr/local/bin/python
#
#   f o r t h . p y
#
import sys, re

ds       = []          # The data stack
cStack   = []          # The control struct stack
heap     = [0]*20      # The data heap
heapNext =  0          # Next avail slot in heap
words    = []          # The input stream of tokens

def main() :
    while 1 :
        pcode = compile()          # compile/run from user
        if pcode == None : print; return
        execute(pcode)

#============================== Lexical Parsing
        
def getWord (prompt="... ") :
    global words
    while not words : 
        try    : lin = raw_input(prompt)+"\n"
        except : return None
        if lin[0:1] == "@" : lin = open(lin[1:-1]).read()
        tokenizeWords(lin)
    word = words[0]
    words = words[1:]
    return word

def tokenizeWords(s) :
    global words                                          # clip comments, split to list of words
    words += re.sub("#.*\n","\n",s+"\n").lower().split()  # Use "#" for comment to end of line

#================================= Runtime operation

def execute (code) :
    p = 0
    while p < len(code) :
        func = code[p]
        p += 1
        newP = func(code,p)
        if newP != None : p = newP

def rAdd (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(a+b)
def rMul (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(a*b)
def rSub (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(a-b)
def rDiv (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(a/b)
def rEq  (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(int(a==b))
def rGt  (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(int(a>b))
def rLt  (cod,p) : b=ds.pop(); a=ds.pop(); ds.append(int(a<b))
def rSwap(cod,p) : a=ds.pop(); b=ds.pop(); ds.append(a); ds.append(b)
def rDup (cod,p) : ds.append(ds[-1])
def rDrop(cod,p) : ds.pop()
def rOver(cod,p) : ds.append(ds[-2])
def rDump(cod,p) : print "ds = ", ds
def rDot (cod,p) : print ds.pop()
def rJmp (cod,p) : return cod[p]
def rJnz (cod,p) : return (cod[p],p+1)[ds.pop()]
def rJz  (cod,p) : return (p+1,cod[p])[ds.pop()==0]
def rRun (cod,p) : execute(rDict[cod[p]]); return p+1
def rPush(cod,p) : ds.append(cod[p])     ; return p+1

def rCreate (pcode,p) :
    global heapNext, lastCreate
    lastCreate = label = getWord()      # match next word (input) to next heap address
    rDict[label] = [rPush, heapNext]    # when created word is run, pushes its address

def rDoes (cod,p) :
    rDict[lastCreate] += cod[p:]        # rest of words belong to created words runtime
    return len(cod)                     # jump p over these

def rAllot (cod,p) :
    global heapNext
    heapNext += ds.pop()                # reserve n words for last create

def rAt  (cod,p) : ds.append(heap[ds.pop()])       # get heap @ address
def rBang(cod,p) : a=ds.pop(); heap[a] = ds.pop()  # set heap @ address
def rComa(cod,p) :                                 # push tos into heap
    global heapNext
    heap[heapNext]=ds.pop()
    heapNext += 1

rDict = {
  '+'  : rAdd, '-'   : rSub, '/' : rDiv, '*'    : rMul,   'over': rOver,
  'dup': rDup, 'swap': rSwap, '.': rDot, 'dump' : rDump,  'drop': rDrop,
  '='  : rEq,  '>'   : rGt,   '<': rLt,
  ','  : rComa,'@'   : rAt, '!'  : rBang,'allot': rAllot,

  'create': rCreate, 'does>': rDoes,
}
#================================= Compile time 

def compile() :
    pcode = []; prompt = "Forth> "
    while 1 :
        word = getWord(prompt)  # get next word
        if word == None : return None
        cAct = cDict.get(word)  # Is there a compile time action ?
        rAct = rDict.get(word)  # Is there a runtime action ?

        if cAct : cAct(pcode)   # run at compile time
        elif rAct :
            if type(rAct) == type([]) :
                pcode.append(rRun)     # Compiled word.
                pcode.append(word)     # for now do dynamic lookup
            else : pcode.append(rAct)  # push builtin for runtime
        else :
            # Number to be pushed onto ds at runtime
            pcode.append(rPush)
            try : pcode.append(int(word))
            except :
                try: pcode.append(float(word))
                except : 
                    pcode[-1] = rRun     # Change rPush to rRun
                    pcode.append(word)   # Assume word will be defined
        if not cStack : return pcode
        prompt = "...    "
    
def fatal (mesg) : raise mesg

def cColon (pcode) :
    if cStack : fatal(": inside Control stack: %s" % cStack)
    label = getWord()
    cStack.append(("COLON",label))  # flag for following ";"

def cSemi (pcode) :
    if not cStack : fatal("No : for ; to match")
    code,label = cStack.pop()
    if code != "COLON" : fatal(": not balanced with ;")
    rDict[label] = pcode[:]       # Save word definition in rDict
    while pcode : pcode.pop()

def cBegin (pcode) :
    cStack.append(("BEGIN",len(pcode)))  # flag for following UNTIL

def cUntil (pcode) :
    if not cStack : fatal("No BEGIN for UNTIL to match")
    code,slot = cStack.pop()
    if code != "BEGIN" : fatal("UNTIL preceded by %s (not BEGIN)" % code)
    pcode.append(rJz)
    pcode.append(slot)

def cIf (pcode) :
    pcode.append(rJz)
    cStack.append(("IF",len(pcode)))  # flag for following Then or Else
    pcode.append(0)                   # slot to be filled in

def cElse (pcode) :
    if not cStack : fatal("No IF for ELSE to match")
    code,slot = cStack.pop()
    if code != "IF" : fatal("ELSE preceded by %s (not IF)" % code)
    pcode.append(rJmp)
    cStack.append(("ELSE",len(pcode)))  # flag for following THEN
    pcode.append(0)                     # slot to be filled in
    pcode[slot] = len(pcode)            # close JZ for IF

def cThen (pcode) :
    if not cStack : fatal("No IF or ELSE for THEN to match")
    code,slot = cStack.pop()
    if code not in ("IF","ELSE") : fatal("THEN preceded by %s (not IF or ELSE)" % code)
    pcode[slot] = len(pcode)             # close JZ for IF or JMP for ELSE

cDict = {
  ':'    : cColon, ';'    : cSemi, 'if': cIf, 'else': cElse, 'then': cThen,
  'begin': cBegin, 'until': cUntil,
}
  
if __name__ == "__main__" : main()

 

 

 

 

 

 

 

 

 

 

 

 

 

 

莫非定律

《別子才司令》宋‧方岳
不如意事常八九,
可與語人無二三。
自識荆門子才甫,
夢馳鐵馬戰城南。

 

俗話說︰智者千慮終有一失,愚者千慮總有一得。這麼看來所謂的『人生不如意事十常八九』,是因為『思慮』難得『完備』之故!為什麼『思慮』無法『完備』的呢?是由於『偶發』之『無常』!那麼為什麼西諺也講︰

I never had a slice of bread particularly large and wide that did not fall upon the floor and always on the buttered side.

麵包落地的時候,永遠是抹奶油的一面著地。

 

雖我們曾在《布林代數》談及過這個『莫非定律』的由來︰

近年來根據美國方言學會 ADS American Dialect Society Stephen Goranson 的研究,一八七七年的一次工程學會會議上 Alfred Holt 的報告上提出︰

It is found that anything that can go wrong at sea generally does go wrong sooner or later, so it is not to be wondered that owners prefer the safe to the scientific …. Sufficient stress can hardly be laid on the advantages of simplicity. The human factor cannot be safely neglected in planning machinery. If attention is to be obtained, the engine must be such that the engineer will be disposed to attend to it.

,由此看來,說不定是 De Morgan 錯記『莫非』Murphy 的

Mathematician Augustus De Morgan wrote on June 23, 1866: “The first experiment already illustrates a truth of the theory, well confirmed by practice, what-ever can happen will happen if we make trials enough.” In later publications “whatever can happen will happen” occasionally is termed “Murphy’s law,” which raises the possibility — if something went wrong — that “Murphy” is “De Morgan” misremembered (an option, among others, raised by Goranson on American Dialect Society list).

這個大名鼎鼎的『莫非定律』說︰

凡事要可能出錯,必定會出錯!!

從科學和演算法方面來講,它和『最糟情境』worst-case scenario 分析同義,然而就文化層面而言,它代表著一種反諷式的幽默,也許能排解日常生活中諸多遭遇的不滿。

那人們該如何設想『莫非之機率』的呢?一九零九年時法國數學家埃米爾‧博雷爾 Félix-Édouard-Justin-Émile Borel 在一本機率書中介紹了一個『打字猴子』的概念︰

讓一隻猴子在打字機上隨機地按鍵,當這樣作的時間趨近無窮時,似乎必然能夠打出任何指定的文本,比如說整套莎士比亞的著作。

他用這隻猴子來比喻一種能夠產生無窮的隨機語詞字串之『抽象設備』。這個『無限猴子定理』理論是說︰把一個很大但有限的數看成無限的推論是錯誤的。猴子能否完全無誤的敲打出一部莎士比亞的哈姆雷特,縱使它發生的機率非常之小然而絕非是零!就像戰國時期的列禦寇在《列子‧湯問》中寫到︰

愚公移山

太行、王屋二山,方七百里,高萬仞,本在冀州之南,河陽之北。

北 山愚公者,年且九十,面山而居。懲山北之塞,出入之迂也。聚室而謀曰:「吾與汝畢力平險,指通豫南,達於漢陰,可乎?」雜然相許。其妻獻疑曰:「以君之 力,曾不能損魁父之丘,如太行、王屋何?且焉置土石?」雜曰:「投諸渤海之尾,隱土之北。」遂率子孫荷擔者三夫,叩石墾壤,箕畚運於渤海之尾。鄰人京城氏 之孀妻有遺男,始齔,跳往助之。寒暑易節,始一反焉。

河曲智叟笑而止之曰:「甚矣,汝之不惠。以殘年餘力,曾不能毀山之一毛,其如土石何?」北山愚公長息曰:「汝心之固,固不可徹,曾不若孀妻弱子。雖我之死,有子存焉;子又生孫,孫又生子 ;子又有子,子又有孫;子子孫孫無窮匱也,而山不加增,何苦而不平?」河曲智叟亡以應。

操蛇之神聞之,懼其不已也,告之於帝。帝感其誠,命誇娥氏二子負二山,一厝朔東,一厝雍南。自此,冀之南,漢之陰,無隴斷焉 。

───

 

不過依舊很難解釋到底為什麼︰

凡事要可能出錯,必定會出錯!!

彷彿總墜無窮思辯之中!!

設想只知道

\lim \limits_{Time \to \infty} \ IS_{OK}(Time) \ \to 0

,果能邏輯推斷

\lim \limits_{Time \to \infty} \ Time \times \ IS_{OK}(Time) \ \to \ ?

是多少的乎??

或許這正是古話︰

何謂『明智』?能將『事後之明』,用於『臨事』也。

希冀之事耶!!

莫非『派生』 Python 深知其理,於是有

29.12. inspect — Inspect live objects

Source code: Lib/inspect.py


The inspect module provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects. For example, it can help you examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information you need to display a detailed traceback.

There are four main kinds of services provided by this module: type checking, getting source code, inspecting classes and functions, and examining the interpreter stack.

───

 

程式庫的嗎??所以有『派生者』 Pythonian 追隨其理念,創作

Python object browser implemented in Qt

objbrowser

Extensible Python object inspection tool implemented in Qt.

Displays objects as trees and allows you to inspect their attributes recursively (e.g. browse through a list of dictionaries). You can add your own inspection methods as new columns to the tree view, or as radio buttons to the details pane. Altering existing inspection methods is possible as well.

Installation:

  1. Install PySide: http://qt-project.org/wiki/Category:LanguageBindings::PySide
  2. Run the installer:
pip install objbrowser

User interface:

objbrowser screen shot

From the View menu you can select some extra columns, for instance the objects’ id column. This can also be done by right-clicking on the table header.

If the Show routine attributes from the View menu is checked, functions and methods that are attributes of the object are shown, otherwise they are hidden. Routines that are not an object attribute, for instance functions that are an element in a list, are always displayed.

If the Show special attributes from the View menu is checked, attributes whos name start and end with two underscores are displayed.

The details pane at the bottom shows object properties that do not fit on one line, such as the docstrings and the output of various functions of the inspect module from the Python standard library.

───

 

,將之宣揚於天下的也!!