W!o+ 的《小伶鼬工坊演義》︰巴蛇食象

俗話說︰人心不足蛇吞象。用以表達過份貪婪。據維基百科詞條講 ,語源出自《三海經》之『巴蛇食象』︰

山海經校注·海內南經

  (山海經第十·山海經海經新釋卷五)

15、巴蛇食象,三歲而出其骨,君子服之,無心腹之疾①。其為蛇青黃赤黑②。一曰黑蛇青首③,在犀牛西。

① 郭璞云:“今南方(虫丹)蛇(藏經本作蟒蛇——珂)吞鹿,鹿已爛,自絞於樹腹中,骨皆穿鱗甲間出,此其類也。楚詞曰:‘有蛇吞象,厥大何如?’說者云長千尋。”郝懿行云:“今楚詞天問作‘一蛇吞象’,與郭所引異。王逸注引此經作‘靈蛇吞象’,並與今本異也。”珂案:淮南子本經篇云:“羿斷修蛇於洞庭。”路史後紀十以“修蛇”作“長它 ”,羅苹注云:“長它即所謂巴蛇,在江岳間。其墓今巴陵之巴丘,在州治側。江源記(即江記,六朝宋庾仲雍撰 ——珂)云:‘羿屠巴蛇於洞庭,其骨若陵,曰巴陵也。’”岳陽風土記(宋范致明撰)亦云:“今巴蛇□在州院廳側,巍然而高,草木叢翳。兼有巴蛇廟,在岳陽 門內 。”又云:“象骨山。山海經云:‘巴蛇吞象。’暴其骨於此。山旁湖謂之象骨港。”是均從此經及淮南子附會而生出之神話。然而既有冢有廟,有山有港,言之確鑿,則知傳播於民間亦已久矣。

② 珂案:言其文采斑爛也。

③ 珂案:海內經云:“有巴遂山,澠水出焉。又有朱卷之國。有黑蛇,青首,食象。”即此。巴,小篆作□,說文十四云:“蟲也;或曰 :食象蛇。象 形。”則所象者,物在蛇腹彭亨之形。山海經多稱大蛇 ,如北山經云:“大咸之山,有蛇名曰長蛇,其毛如彘毫,其音如鼓柝。”北次三經云:“錞於毋逢之山,是 有大蛇,赤首白身,其音如牛,見則其邑大旱。”是可以“吞象”矣。水經注葉榆河云:“山多大蛇 ,名曰髯蛇,長十丈,圍七八尺,常在樹上伺鹿獸,鹿獸過,便低頭繞之。有頃鹿死,先濡令濕訖,便吞,頭角骨皆鑽皮出。山夷始見蛇不動時,便以大竹籤籤蛇頭至尾,殺而食之,以為珍異。”即郭注所謂(虫丹)蛇也。

───

 

然而從《山海經》的說法『君子服之,無心腹之疾。』來看,一點也沒有貪婪的意思吧!郭璞認為『巴蛇』是『蟒蛇』一類。無怪乎『TensorFlow』能在二十行內完成『手寫阿拉伯數字』辨識程式︰

MNIST For ML Beginners

This tutorial is intended for readers who are new to both machine learning and TensorFlow. If you already know what MNIST is, and what softmax (multinomial logistic) regression is, you might prefer this faster paced tutorial. Be sure to install TensorFlow before starting either tutorial.

When one learns how to program, there’s a tradition that the first thing you do is print “Hello World.” Just like programming has Hello World, machine learning has MNIST.

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:

It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.

In this tutorial, we’re going to train a model to look at images and predict what digits they are. Our goal isn’t to train a really elaborate model that achieves state-of-the-art performance — although we’ll give you code to do that later! — but rather to dip a toe into using TensorFlow. As such, we’re going to start with a very simple model, called a Softmax Regression.

The actual code for this tutorial is very short, and all the interesting stuff happens in just three lines. However, it is very important to understand the ideas behind it: both how TensorFlow works and the core machine learning concepts. Because of this, we are going to very carefully work through the code.

……

# The MNIST Data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Implementing the Regression
import tensorflow as tf

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

# Training
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluating Our Model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))



 

This should be about 91%.

Is that good? Well, not really. In fact, it’s pretty bad. This is because we’re using a very simple model. With some small changes, we can get to 97%. The best models can get to over 99.7% accuracy! (For more information, have a look at this list of results.)

What matters is that we learned from this model. Still, if you’re feeling a bit down about these results, check out the next tutorial where we do a lot better, and learn how to build more sophisticated models using TensorFlow!

───

 

考之以『THE MNIST DATABASE』上的『錯誤率』比較表︰

CLASSIFIER PREPROCESSING TEST ERROR RATE (%) Reference
Linear Classifiers
linear classifier (1-layer NN) none 12.0 LeCun et al. 1998
linear classifier (1-layer NN) deskewing 8.4 LeCun et al. 1998
pairwise linear classifier deskewing 7.6 LeCun et al. 1998
K-Nearest Neighbors
K-nearest-neighbors, Euclidean (L2) none 5.0 LeCun et al. 1998
K-nearest-neighbors, Euclidean (L2) none 3.09 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 none 2.83 Kenneth Wilder, U. Chicago
K-nearest-neighbors, Euclidean (L2) deskewing 2.4 LeCun et al. 1998
K-nearest-neighbors, Euclidean (L2) deskewing, noise removal, blurring 1.80 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 deskewing, noise removal, blurring 1.73 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 deskewing, noise removal, blurring, 1 pixel shift 1.33 Kenneth Wilder, U. Chicago
K-nearest-neighbors, L3 deskewing, noise removal, blurring, 2 pixel shift 1.22 Kenneth Wilder, U. Chicago
K-NN with non-linear deformation (IDM) shiftable edges 0.54 Keysers et al. IEEE PAMI 2007
K-NN with non-linear deformation (P2DHMDM) shiftable edges 0.52 Keysers et al. IEEE PAMI 2007
K-NN, Tangent Distance subsampling to 16×16 pixels 1.1 LeCun et al. 1998
K-NN, shape context matching shape context feature extraction 0.63 Belongie et al. IEEE PAMI 2002
Boosted Stumps
boosted stumps none 7.7 Kegl et al., ICML 2009
products of boosted stumps (3 terms) none 1.26 Kegl et al., ICML 2009
boosted trees (17 leaves) none 1.53 Kegl et al., ICML 2009
stumps on Haar features Haar features 1.02 Kegl et al., ICML 2009
product of stumps on Haar f. Haar features 0.87 Kegl et al., ICML 2009
Non-Linear Classifiers
40 PCA + quadratic classifier none 3.3 LeCun et al. 1998
1000 RBF + linear classifier none 3.6 LeCun et al. 1998
SVMs
SVM, Gaussian Kernel none 1.4
SVM deg 4 polynomial deskewing 1.1 LeCun et al. 1998
Reduced Set SVM deg 5 polynomial deskewing 1.0 LeCun et al. 1998
Virtual SVM deg-9 poly [distortions] none 0.8 LeCun et al. 1998
Virtual SVM, deg-9 poly, 1-pixel jittered none 0.68 DeCoste and Scholkopf, MLJ 2002
Virtual SVM, deg-9 poly, 1-pixel jittered deskewing 0.68 DeCoste and Scholkopf, MLJ 2002
Virtual SVM, deg-9 poly, 2-pixel jittered deskewing 0.56 DeCoste and Scholkopf, MLJ 2002
Neural Nets
2-layer NN, 300 hidden units, mean square error none 4.7 LeCun et al. 1998
2-layer NN, 300 HU, MSE, [distortions] none 3.6 LeCun et al. 1998
2-layer NN, 300 HU deskewing 1.6 LeCun et al. 1998
2-layer NN, 1000 hidden units none 4.5 LeCun et al. 1998
2-layer NN, 1000 HU, [distortions] none 3.8 LeCun et al. 1998
3-layer NN, 300+100 hidden units none 3.05 LeCun et al. 1998
3-layer NN, 300+100 HU [distortions] none 2.5 LeCun et al. 1998
3-layer NN, 500+150 hidden units none 2.95 LeCun et al. 1998
3-layer NN, 500+150 HU [distortions] none 2.45 LeCun et al. 1998
3-layer NN, 500+300 HU, softmax, cross entropy, weight decay none 1.53 Hinton, unpublished, 2005
2-layer NN, 800 HU, Cross-Entropy Loss none 1.6 Simard et al., ICDAR 2003
2-layer NN, 800 HU, cross-entropy [affine distortions] none 1.1 Simard et al., ICDAR 2003
2-layer NN, 800 HU, MSE [elastic distortions] none 0.9 Simard et al., ICDAR 2003
2-layer NN, 800 HU, cross-entropy [elastic distortions] none 0.7 Simard et al., ICDAR 2003
NN, 784-500-500-2000-30 + nearest neighbor, RBM + NCA training [no distortions] none 1.0 Salakhutdinov and Hinton, AI-Stats 2007
6-layer NN 784-2500-2000-1500-1000-500-10 (on GPU) [elastic distortions] none 0.35 Ciresan et al. Neural Computation 10, 2010 and arXiv 1003.0358, 2010
committee of 25 NN 784-800-10 [elastic distortions] width normalization, deslanting 0.39 Meier et al. ICDAR 2011
deep convex net, unsup pre-training [no distortions] none 0.83 Deng et al. Interspeech 2010
Convolutional nets
Convolutional net LeNet-1 subsampling to 16×16 pixels 1.7 LeCun et al. 1998
Convolutional net LeNet-4 none 1.1 LeCun et al. 1998
Convolutional net LeNet-4 with K-NN instead of last layer none 1.1 LeCun et al. 1998
Convolutional net LeNet-4 with local learning instead of last layer none 1.1 LeCun et al. 1998
Convolutional net LeNet-5, [no distortions] none 0.95 LeCun et al. 1998
Convolutional net LeNet-5, [huge distortions] none 0.85 LeCun et al. 1998
Convolutional net LeNet-5, [distortions] none 0.8 LeCun et al. 1998
Convolutional net Boosted LeNet-4, [distortions] none 0.7 LeCun et al. 1998
Trainable feature extractor + SVMs [no distortions] none 0.83 Lauer et al., Pattern Recognition 40-6, 2007
Trainable feature extractor + SVMs [elastic distortions] none 0.56 Lauer et al., Pattern Recognition 40-6, 2007
Trainable feature extractor + SVMs [affine distortions] none 0.54 Lauer et al., Pattern Recognition 40-6, 2007
unsupervised sparse features + SVM, [no distortions] none 0.59 Labusch et al., IEEE TNN 2008
Convolutional net, cross-entropy [affine distortions] none 0.6 Simard et al., ICDAR 2003
Convolutional net, cross-entropy [elastic distortions] none 0.4 Simard et al., ICDAR 2003
large conv. net, random features [no distortions] none 0.89 Ranzato et al., CVPR 2007
large conv. net, unsup features [no distortions] none 0.62 Ranzato et al., CVPR 2007
large conv. net, unsup pretraining [no distortions] none 0.60 Ranzato et al., NIPS 2006
large conv. net, unsup pretraining [elastic distortions] none 0.39 Ranzato et al., NIPS 2006
large conv. net, unsup pretraining [no distortions] none 0.53 Jarrett et al., ICCV 2009
large/deep conv. net, 1-20-40-60-80-100-120-120-10 [elastic distortions] none 0.35 Ciresan et al. IJCAI 2011
committee of 7 conv. net, 1-20-P-40-P-150-10 [elastic distortions] width normalization 0.27 +-0.02 Ciresan et al. ICDAR 2011
committee of 35 conv. net, 1-20-P-40-P-150-10 [elastic distortions] width normalization 0.23 Ciresan et al. CVPR 2012

 

 

『正確率』有百分之九十一以上,果真算不得什麼好?但並非因為如此,所以我們不多介紹這個程式庫。實在是它連在樹莓派 3 上都跑的太慢了!況且要了解『神經網絡』,最好能了解它的基本原理 ,為此作者接續將串講一本 Michael Nielsen 所寫的『公開書』

Neural Networks and Deep Learning

,期能引領有興趣者入此門徑。