W!o+ 的《小伶鼬工坊演義》︰神經網絡【FFT】二

什麼是事物的『特徵』呢?為什麼它的『提取方法』很重要?維基百科詞條這麼說︰

Feature extraction

In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction.

When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters, or the repetitiveness of images presented as pixels), then it can be transformed into a reduced set of features (also named a features vector). This process is called feature selection. The selected features are expected to contain the relevant information from the input data, so that the desired task can be performed by using this reduced representation instead of the complete initial data.

───

 

假使考慮如何『定義』事物耶?也許『特徵』就是『界定性徵』,可以用來『區分』相異的東西!所以人們自然懂得『汪星人』不同於『喵星人』的也!!

於是乎好奇那『聲音』本有『調子』,可以用

Cepstrum

A cepstrum (/ˈkɛpstrəmˈˌˈsɛpstrəmˈ/) is the result of taking the Inverse Fourier transform (IFT) of the logarithm of the estimated spectrum of a signal. It may be pronounced in the two ways given, the second having the advantage of avoiding confusion with ‘kepstrum’ which also exists (see below). There is a complex cepstrum, a real cepstrum, a power cepstrum, and a phase cepstrum. The power cepstrum in particular finds applications in the analysis of human speech.

The name “cepstrum” was derived by reversing the first four letters of “spectrum”. Operations on cepstra are labelled quefrency analysis (aka quefrency alanysis[1]), liftering, or cepstral analysis.

Cepstrum_signal_analysis

Steps in forming cepstrum from time history

───

 

來探討。那麼『圖象』可有『調子』乎?!能否依樣畫葫蘆來研究的呢!?不管『笨鳥先飛』、『菜鳥忘飛』、『老鳥已飛』……… 科技史裡滿載『傻問題』之『大成就』矣!!??何不就效法一下嘛??!!

【還是用五】

Figure 5

 

>>> img = training_data[0][0].reshape(28,28)
>>> f_img = network.np.fft.rfft2(img)
>>> logp_img = 2*network.np.log(network.np.abs(f_img))
>>> plt.imshow(logp_img)
<matplotlib.image.AxesImage object at 0x51af290>
>>> plt.show()

 

Figure 5p

 

>>> ilogpf_img = network.np.fft.irfft2(logp_img)
>>> cf_img = network.np.abs(ilogpf_img)**2
>>> plt.imshow(cf_img)
<matplotlib.image.AxesImage object at 0x51d1050>
>>> plt.show()

 

Figure 5c

 

【依舊選零】

Figure 0

 

>>> img1 = training_data[1][0].reshape(28,28)
>>> f_img1 = network.np.fft.rfft2(img1)
>>> logp_img1 = 2*network.np.log(network.np.abs(f_img1))
>>> plt.imshow(logp_img1)
<matplotlib.image.AxesImage object at 0x5091e50>
>>> plt.show()

 

Figure 0p

 

>>> ilogpf_img1 = network.np.fft.irfft2(logp_img1)
>>> cf_img1 = network.np.abs(ilogpf_img1)**2
>>> plt.imshow(cf_img1)
<matplotlib.image.AxesImage object at 0x51b9b50>
>>> plt.show()
>>> 

 

Figure 0c

 

【參考資料】

Discrete Fourier Transform (numpy.fft)

Standard FFTs

fft(a[, n, axis, norm]) Compute the one-dimensional discrete Fourier Transform.
ifft(a[, n, axis, norm]) Compute the one-dimensional inverse discrete Fourier Transform.
fft2(a[, s, axes, norm]) Compute the 2-dimensional discrete Fourier Transform This function computes the n-dimensional discrete Fourier Transform over any axes in an M-dimensional array by means of the Fast Fourier Transform (FFT).
ifft2(a[, s, axes, norm]) Compute the 2-dimensional inverse discrete Fourier Transform.
fftn(a[, s, axes, norm]) Compute the N-dimensional discrete Fourier Transform.
ifftn(a[, s, axes, norm]) Compute the N-dimensional inverse discrete Fourier Transform.

Real FFTs

rfft(a[, n, axis, norm]) Compute the one-dimensional discrete Fourier Transform for real input.
irfft(a[, n, axis, norm]) Compute the inverse of the n-point DFT for real input.
rfft2(a[, s, axes, norm]) Compute the 2-dimensional FFT of a real array.
irfft2(a[, s, axes, norm]) Compute the 2-dimensional inverse FFT of a real array.
rfftn(a[, s, axes, norm]) Compute the N-dimensional discrete Fourier Transform for real input.
irfftn(a[, s, axes, norm]) Compute the inverse of the N-dimensional FFT of real input.

Hermitian FFTs

hfft(a[, n, axis, norm]) Compute the FFT of a signal which has Hermitian symmetry (real spectrum).
ihfft(a[, n, axis, norm]) Compute the inverse FFT of a signal which has Hermitian symmetry.

Helper routines

fftfreq(n[, d]) Return the Discrete Fourier Transform sample frequencies.
rfftfreq(n[, d]) Return the Discrete Fourier Transform sample frequencies (for usage with rfft, irfft).
fftshift(x[, axes]) Shift the zero-frequency component to the center of the spectrum.
ifftshift(x[, axes]) The inverse of fftshift.

Background information

Fourier analysis is fundamentally a method for expressing a function as a sum of periodic components, and for recovering the function from those components. When both the function and its Fourier transform are replaced with discretized counterparts, it is called the discrete Fourier transform (DFT). The DFT has become a mainstay of numerical computing in part because of a very fast algorithm for computing it, called the Fast Fourier Transform (FFT), which was known to Gauss (1805) and was brought to light in its current form by Cooley and Tukey [CT]. Press et al. [NR] provide an accessible introduction to Fourier analysis and its applications.

Because the discrete Fourier transform separates its input into components that contribute at discrete frequencies, it has a great number of applications in digital signal processing, e.g., for filtering, and in this context the discretized input to the transform is customarily referred to as a signal, which exists in the time domain. The output is called a spectrum or transform and exists in the frequency domain.

Implementation details

There are many ways to define the DFT, varying in the sign of the exponent, normalization, etc. In this implementation, the DFT is defined as

A_k = \sum_{m=0}^{n-1} a_m \exp\left\{-2\pi i{mk \over n}\right\} \qquad k = 0,\ldots,n-1.

The DFT is in general defined for complex inputs and outputs, and a single-frequency component at linear frequency f is represented by a complex exponential a_m = \exp\{2\pi i\,f m\Delta t\}, where \Delta t is the sampling interval.

The values in the result follow so-called “standard” order: If A = fft(a, n), then A[0] contains the zero-frequency term (the mean of the signal), which is always purely real for real inputs. Then A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains the negative-frequency terms, in order of decreasingly negative frequency. For an even number of input points, A[n/2] represents both positive and negative Nyquist frequency, and is also purely real for real input. For an odd number of input points, A[(n-1)/2] contains the largest positive frequency, while A[(n+1)/2] contains the largest negative frequency. The routine np.fft.fftfreq(n) returns an array giving the frequencies of corresponding elements in the output. The routine np.fft.fftshift(A) shifts transforms and their frequencies to put the zero-frequency components in the middle, and np.fft.ifftshift(A) undoes that shift.

When the input a is a time-domain signal and A = fft(a), np.abs(A) is its amplitude spectrum and np.abs(A)**2 is its power spectrum. The phase spectrum is obtained by np.angle(A).

The inverse DFT is defined as

a_m = \frac{1}{n}\sum_{k=0}^{n-1}A_k\exp\left\{2\pi i{mk\over n}\right\} \qquad m = 0,\ldots,n-1.

It differs from the forward transform by the sign of the exponential argument and the default normalization by 1/n.

───

 

竟然會看起來很像??似乎又有點不一樣!!到底該說是『行』還是『不行』的呀???

 

 

 

 

 

 

 

 

 

 

 

 

 

W!o+ 的《小伶鼬工坊演義》︰神經網絡【FFT】一

若說身在『影像處理』之領域不知道

快速傅立葉變換

快速傅立葉變換英語:Fast Fourier Transform, FFT),是計算序列的離散傅立葉變換(DFT)或其逆變換的一種演算法傅立葉分析將訊號從原始域(通常是時間或空間)轉換到頻域的表示或者逆過來轉換。FFT會通過把DFT矩陣分解稀疏(大多為零)因子之積來快速計算此類變換。[1] 因此,它能夠將計算DFT的複雜度從只用DFT定義計算需要的 O(n^2),降低到 O(n \log n),其中 n 為資料大小。

快速傅立葉變換廣泛的應用於工程、科學和數學領域。這裡的基本思想在1965年得到才普及,但早在1805年就已推匯出來。[2] 1994年吉爾伯特·斯特朗把FFT描述為「我們一生中最重要的數值演算法[3],它還被IEEE科學與工程計算期刊列入20世紀十大演算法。[4]

───

 

大概不可思議!!若問『手寫阿拉伯數字辨識』能不能用手寫數字 Spatial Domain 『空間域』來處理,誠是『大哉問』耶??

就讓我們略窺一下那個『空間域』的圖像︰

>>> import mnist_loader
>>> training_data, validation_data, test_data = \
... mnist_loader.load_data_wrapper()
>>> import network
>>> net = network.Network([784, 30, 10])
>>> npzfile = network.np.load("swb.npz")
>>> net.weights[0] = npzfile["w1"]
>>> net.weights[1] = npzfile["w2"]
>>> net.biases[0] = npzfile["b1"]
>>> net.biases[1] = npzfile["b2"]
>>> import matplotlib.pyplot as plt
>>> img = training_data[0][0].reshape(28,28)
>>> plt.imshow(img,cmap='Greys', interpolation='nearest')
<matplotlib.image.AxesImage object at 0x56e33d0>
>>> plt.show()
>>>

 

【5 之原圖】

Figure 5

 

 

>>> f_img = network.np.fft.fft2(img)
>>> sf_img = network.np.fft.fftshift(f_img)
>>> dbf_img = 20*network.np.log(network.np.abs(sf_img))
>>> plt.imshow(dbf_img, cmap='Greys', interpolation='nearest')
<matplotlib.image.AxesImage object at 0x570a150>
>>> plt.show()
>>>

 

【5 之 FFT db 頻譜】

Figure 5_fft_db

 

 

>>> phase_img = network.np.angle(f_img)
>>> plt.imshow(phase_img, cmap='Greys', interpolation='nearest')
<matplotlib.image.AxesImage object at 0x51bd690>
>>> plt.show()
>>> 

 

【5 之 FFT phase 頻譜】

Figure 5_fft_phase

 

 

>>> iphase_img = network.np.fft.ifft2(phase_img)
>>> iphase_img_p = network.np.abs(iphase_img)
>>> plt.imshow(iphase_img_p, cmap='Greys', interpolation='nearest')
<matplotlib.image.AxesImage object at 0x51c0d90>
>>> plt.show()
>>>

 

【單從相位頻譜重建】

Figure 5_phase_ifft

 

由於涉及『複數』 Complex number

複數,為實數的延伸,它使任一多項式方程式都有。複數當中有個「虛數單位i,它是-1的一個平方根,即i ^2 = -1。任一複數都可表達為x + yi,其中xy皆為實數,分別稱為複數之「實部」和「虛部」。

複數的發現源於三次方程的根的表達式。數學上,「複」字表明所討論的數體為複數,如複矩陣複變函數等。

───

 

那個『學習法則』該怎麼建立的呢?有興趣者或可以到此一遊︰

banner

Welcome

 

The Computational Intelligence Laboratory (CIL) is doing research in the areas of Complex-Valued Neural Networks and Intelligent Image Processing. The CIL is an integrated part of the College of Science, Technology, Engineering and Mathematics of Texas A&M University-Texarkana.

Our research on Complex-Valued Neural Networks is concentrated on the development of the Multi-Valued Neuron (MVN) and MVN-based neural networks paradigms.

Our research on Intelligent Image Processing is concentrated on applications of MVN-based neural networks in image processing and image recognition.

The Director of the Laboratory is Dr. Igor Aizenberg.

An NSF Grant Recipient in 2009-2012

dissolution

───

 

Complex-Valued Neurons

Complex-Valued Neural Networks

The primarily CIL research area is Complex-Valued Neural Networks (CVNNs), mainly Multi-Valued Neurons and neural networks based on them.

Complex-Valued Neural Networks become increasingly popular. The use of complex-valued inputs/outputs, weights and activation functions make it possible to increase the functionality of a single neuron and of a neural network, to improve their performance and to reduce the training time.

The history of complex numbers shows that although it took a long time for them to be accepted (almost 300 years from the first reference to “imaginary numbers” by Girolamo Cardano in 1545 to Leonard Euler’s and Carl Friedrich Gauss’ works published in 1748 and 1831, respectively), they have become an integral part of engineering and mathematics. It is difficult to imagine today how signal processing, aerodynamics, hydrodynamics, energy science, quantum mechanics, circuit analysis, and many other areas of engineering and science could develop without complex numbers. It is a fundamental mathematical fact that complex numbers are a necessary and absolutely natural part of numerical world. Their necessity clearly follows from the Fundamental Theorem of Algebra, which states that every non-constant single-variable polynomial of degree n with complex coefficients has exactly n complex roots, if each root is counted up to its multiplicity.

Answering a question frequently asked by some “conservative” people, what one can get using complex-valued neural networks (“twice more” parameters, more computations, etc.), we may say that one may get the same as using the Fourier transform, but not just the Walsh transform in signal processing. There are many engineering problems in the modern world where complex-valued signals and functions of complex variables are involved and where they are unavoidable. Thus, to employ neural networks for their analysis, approximation, etc., the use of complex-valued neural networks is natural. However, even in the analysis of real-valued signals (for example, images or audio signals) one of the most frequently used approaches is frequency domain analysis, which immediately leads us to the complex domain. In fact, analyzing signal properties in the frequency domain, we see that each signal is characterized by magnitude and phase that carry different information about the signal. This fundamental fact was deeply discovered by A.V. Oppenheim and J.S. Lim in their paper “The importance of phase in signals”, IEEE Proceedings, v. 69, No 5, 1981,pp.: 529- 541. They have shown that the phase in the Fourier spectrum of a signal is much more informative than the magnitude: particularly in the Fourier spectrum of images, just phase contains the information about all shapes, edges, orientation of all objects.

This property can be illustrated by the following example. Let us consider two popular test images �Lena� and �Bridge�.

 
Lena
 
Bridge

 

Let us take their Fourier transforms and then let us swap magnitude and phase of their Fourier spectra combining the phase of �Lena� with the magnitude of �Bridge� and wise-versa. After taking the inverse Fourier transform we clearly realize that those images were restored whose phases were combined with the counterpart magnitude:

 
Restored from Lena Phase + Bridge Magnitude
 
Restored from Bridge phase + Lena Magnitude

 

Thus, in fact, phase contains information of what is represented by the corresponding signal. To use this information properly, the most appropriate solution is movement to the complex domain. Hence, one of the most important characteristics of Complex-Valued Neural Networks is the proper treatment of amplitude and phase information, e.g., the treatment of wave-related phenomena such as electromagnetism, light waves, quantum waves and oscillatory phenomenon.

───

 

也可讀讀

RealvsComplex

https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2011-42.pdf

 

多點了解乎!!??

 

 

 

 

 

 

 

 

 

 

 

 

 

 

W!o+ 的《小伶鼬工坊演義》︰神經網絡【hyper-parameters】四

今天又是『五四』的了。不知那位『德』先生曾否來過?這位『賽』先生可曾長住??卻見世界烽煙不斷!『人道精神』正慾火鍛鍊中!!想起

《論語》‧學而

子貢曰:貧而無諂,富而無驕,何如?

子曰:可也。未若貧而樂,富而好禮者也。

子貢曰:《詩》云:『如切如磋,如琢如磨。』其斯之謂與?

子曰:賜也,始可與言詩已矣!告諸往而知來者。

,感嘆『貪、嗔、痴』果是『娑婆世界』之現象耶??!!

於此篇章之末,與其講 Michael Nielsen 先生做了個『總結』︰

Toward deep learning

While our neural network gives impressive performance, that performance is somewhat mysterious. The weights and biases in the network were discovered automatically. And that means we don’t immediately have an explanation of how the network does what it does. Can we find some way to understand the principles by which our network is classifying handwritten digits? And, given such principles, can we do better?

To put these questions more starkly, suppose that a few decades hence neural networks lead to artificial intelligence (AI). Will we understand how such intelligent networks work? Perhaps the networks will be opaque to us, with weights and biases we don’t understand, because they’ve been learned automatically. In the early days of AI research people hoped that the effort to build an AI would also help us understand the principles behind intelligence and, maybe, the functioning of the human brain. But perhaps the outcome will be that we end up understanding neither the brain nor how artificial intelligence works!

To address these questions, let’s think back to the interpretation of artificial neurons that I gave at the start of the chapter, as a means of weighing evidence. Suppose we want to determine whether an image shows a human face or not:

Credits: 1. Ester Inbar. 2. Unknown. 3. NASA, ESA, G. Illingworth, D. Magee, and P. Oesch (University of California, Santa Cruz), R. Bouwens (Leiden University), and the HUDF09 Team. Click on the images for more details.

……

The end result is a network which breaks down a very complicated question – does this image show a face or not – into very simple questions answerable at the level of single pixels. It does this through a series of many layers, with early layers answering very simple and specific questions about the input image, and later layers building up a hierarchy of ever more complex and abstract concepts. Networks with this kind of many-layer structure – two or more hidden layers – are called deep neural networks.

Of course, I haven’t said how to do this recursive decomposition into sub-networks. It certainly isn’t practical to hand-design the weights and biases in the network. Instead, we’d like to use learning algorithms so that the network can automatically learn the weights and biases – and thus, the hierarchy of concepts – from training data. Researchers in the 1980s and 1990s tried using stochastic gradient descent and backpropagation to train deep networks. Unfortunately, except for a few special architectures, they didn’t have much luck. The networks would learn, but very slowly, and in practice often too slowly to be useful.

Since 2006, a set of techniques has been developed that enable learning in deep neural nets. These deep learning techniques are based on stochastic gradient descent and backpropagation, but also introduce new ideas. These techniques have enabled much deeper (and larger) networks to be trained – people now routinely train networks with 5 to 10 hidden layers. And, it turns out that these perform far better on many problems than shallow neural networks, i.e., networks with just a single hidden layer. The reason, of course, is the ability of deep nets to build up a complex hierarchy of concepts. It’s a bit like the way conventional programming languages use modular design and ideas about abstraction to enable the creation of complex computer programs. Comparing a deep network to a shallow network is a bit like comparing a programming language with the ability to make function calls to a stripped down language with no ability to make such calls. Abstraction takes a different form in neural networks than it does in conventional programming, but it’s just as important.

───

 

不如說祇是個『勸學篇』︰

訊 ︰☿ 把酒飛斝是同道,欲法荀子《勸學篇》趁年少︰

君子曰:學不可以已。青,取之於藍而青於藍;冰,水為之而寒於水。以喻學則才過其本性也。木直中繩,輮以為輪,其曲中規,雖有槁暴,不復挺者,輮使之然也。輮,屈。槁,枯。曓,乾。挻,宜也。《晏子春秋》作「不復贏也」。故木受繩則直,金就礪則利,君子博學而日參省乎己,則知明而行無過矣。參,三也。曾子曰︰「日三省吾身。」知,讀爲智。行,下孟反。故不登高山,不知天之高也;不臨深谿,不知地之厚也;不聞先王之遺言,不知學問之大也。大,謂有益於人。干、越、夷、貉之子,生而同聲,長而異俗,教使之然也。干、越,猶言吳、越。《呂氏春秋》「荊有次非得寶劍於干、越」,高誘曰︰「吳邑也。」貉,東北夷。同聲,謂啼聲同。貉,莫革反。《詩》曰:「嗟爾君子,無恆安息。靖共爾位,好是正直。神之聽之,介爾景福。」《詩》,《小雅‧小明》之篇。靖,謀。介,助。景,大也。無恆安息,戒之不使懷安也。言能謀恭其位,好正宜之道,則神聽而助之福,引此詩以喻勤學也。神莫大於化道,福莫長於無禍。爲學則自化道,故神莫大焉。修身則自無禍,故福莫長焉。吾嘗終日而思矣,不如須臾之所學也,吾嘗跂而望矣,不如登高之博見也 。跂,舉足也。登高而招,臂非加長也,而見者遠;順風而呼,聲非加疾也,而聞者彰。假輿馬者,非利足也,而致千里;假舟楫者,非能水也,而絕江河。能,善。絶,過。君子生非異也,善假於物也。皆以喻修身在假於學。生非異,言與衆人同也。南方有鳥焉,名曰蒙鳩,以羽為巢而編之以髮,繫之葦苕,風至苕折,卵破子死。巢非不完也 ,所繫者然也。蒙 鳩,鷦鷯也。苕,葦之秀也,今巧婦鳥之巢至精密,多繫於葦竹之上是也。「蒙」當爲「蔑」。《方言》雲︰「鷦鷯,關而西謂之桑飛,或謂之蔑雀。」或曰︰一名 蒙鳩,亦以其愚也。言人不知學問,其所置身亦猶繫葦之危也。《說苑》︰「客謂孟嘗君曰︰『鷦鷯巢於葦苕,箸之髮毛,可謂完堅矣,大風至則苕折卵破子死者何 也?其所託者然也。』西方有木焉,名曰射干,莖長四寸,生於高山之上,而臨百仞之淵,木莖非能長也,所立者然也 。《本草》藥名有射干,一名烏扇。陶弘景雲︰「花白莖長,如射人之執竿。」又引阮公詩云「射干臨層城」,是生於高處也。據《本草》在《草部》中,又生南陽川穀,此雲「西方有木」,未詳。或曰︰「長四寸」卽是草,雲木,誤也。蓋生南陽,亦生西方也。射音夜。蓬生麻中,不扶而直。蘭槐之根是爲芷。其漸之滫,君子不近,庶人不服,其質非不美也,所漸者然也。蘭槐,香草,其根是爲芷也。《本草》︰「白芷一名白茝。」陶弘景雲︰「卽《離騷》所謂蘭茝也。」葢苗名蘭茝,根名芷也。弱槐當是蘭茝別名,故云「蘭槐之根是爲芷」也。漸,漬也,染也。滫,溺也。言雖香草,浸漬於溺中,則可惡也。漸,子廉反。滫,思酒反。故君子居必擇鄉,遊必就士,所以防邪僻而近中正也。物類之起,必有所始。榮辱之來,必象其德。肉腐出蟲,魚枯生蠹。怠慢忘身,禍災乃作。強自取柱,柔自取束。凡物強則以爲柱而任勞,柔則見束而約急,皆其自取也。邪穢在身,怨之所構。構,結也。言亦所自取。施薪若一,火就燥也;布薪於地,均若一,火就燥而焚之矣。平地若一,水就溼也。草木疇生,禽獸羣焉,物各從其類也。疇與儔同,類也。是故質的張而弓矢至焉,林木茂而斧斤至焉,所謂召禍也。質,射矦。的,正鵠也。樹成蔭而衆鳥息焉,醯酸而蜹聚焉。喻有德則慕之者衆。故言有召禍也,行有招辱也,君子慎其所立乎!禍福如此,不可不慎所立。所立,卽謂學也。

積土成山,風雨興焉;積水成淵,蛟龍生焉;積善成德,而神明自得,聖心備焉。神明自得,謂自通於神明。故不積蹞步,無以千里;半步曰蹞。蹞與跬同。不積小流,無以成江海。騏驥一躍,不能十步;駑馬十駕,言駑馬十度引車,則亦及騏驥之一躍。據下雲「駑馬十駕,則亦及之」,此亦當同,疑脫一句。功在不舍。鍥而舍之,朽木不折;鍥而不舍 ,金石可鏤。言立功在於不舍。舍與捨同。鍥,刻也,苦結反。《春秋傳》曰「陽虎借邑人之車,鍥其軸」也。螾無爪牙之利,筋骨之強,上食埃土,下飲黃泉,用心一也。螾與蚓同,蚯蚓也。蟹八跪而二螯,非虵蟺之穴無可寄託者,用心躁也。跪,足也。《韓子》以刖足爲刖跪。螫,蟹首上如鉞者。許叔重《說文》雲「蟹六足二螫」也。是故無冥冥之志者無昭昭之明 ,無惛惛之事者無赫赫之功。冥冥、惛惛,皆專默精誠之謂也。行衢道者不至,事兩君者不容。《爾雅》雲︰「四達謂之衢。」孫炎雲︰「衢,交道四出也。」或曰︰衢道,兩道也。不至,不能有所至。下篇有「楊朱哭衢塗」。今秦俗猶以兩爲衢,古之遺言歟?目不能兩視而明,耳不能兩聽而聰。螣蛇無足而飛,《爾雅》云:「螣,螣蛇。」郭璞雲「龍類,能興雲霧而遊其中」也。梧鼠五技而窮。「梧鼠」當爲「鼫鼠」,蓋本誤爲「鼯」字,傳寫又誤爲「梧」耳。技,才能也。言技能雖多而不能如螣蛇專一,故窮。五技,謂能飛不能上屋,能緣不能窮木,能游不能渡谷,能穴不能掩身,能走不能先人。《詩》曰 :「屍鳩在桑,其子七兮。淑人君子,其儀一兮。其儀一兮,心如結兮。」故君子結於一也。《詩》,《曹風‧屍鳩》之篇。毛雲︰「屍鳩,鴶鞠也。屍鳩之養七子,旦從上而下,暮從下而上,平均如一。善人君子,其執義亦當如屍鳩之一。執義一則用心堅固。」故曰「心如結」也。

─── 摘自《M♪o 之學習筆記本《編者跋》

 

而那個『應用之道』尚待『切磋琢磨』乎!!??

屠龍刀

屠龍刀

論語‧《陽貨

子之武城,聞弦歌之聲。夫子莞爾而笑,曰:「割雞焉用牛刀?」子游對曰:「昔者偃也聞諸夫子曰:『君子學道則愛人,小人學道則易使也。』」子曰:「二三子!偃之言是也。前言戲之耳。」

所 謂『相由心生』是說精神外顯的『形貌』從『用心方向』而來,這個『習焉不察』之內在『心相』,常可以用來分辨『行業』。一行有一行的規矩,百業有百業的訣 竅,入了行,從了業,自然帶有某種『氣息』的吧!如何才能夠不著『相』?若可『無所住』而生其『心』,那麼既無『我心』何來『我相』的呢!!

那麼這個《子之武城》一事,是否有個『前言』對上『後語』,可分出『對錯好壞』的呢?也許有個『禮樂』之『理』和『禮樂』之『用』的差別,想那『子游』為武城宰,採用『禮樂』教化之道,孔夫子卻『莞爾』笑,豈有不『以子之言,擊子之語』的哩!然而夫子所謂『戲之』果真是說『割雞焉用牛刀?』是錯了嗎?恐是不樂見『禮樂』被當作了『名器』的吧!就像到了宋代的『存天理,去人欲』,導致『死生事小,失節事大』,終演成『禮教殺人』之憾事 !!於是

祇『』這樣『』,不『』那樣『使』,終究難了『用大』之道 ── 無用而不通達 ── ,如如不動,應事而動,因事制宜。

正說著『以正治國』和『以奇用兵』,『為學之法』與『用學之法 』的不同,也須避免那『紙上談兵』之過。此事《孫子兵法

地形‧第十

孫子曰:地形者、有者、有者、有者、有者、有者 。我可以往,彼可以來,曰通。通形者, 先居高陽,利糧道,以戰則利。可以往,難以返,曰掛。掛形者,敵無備,出而勝之,敵若有備,出而不勝,難以返,不利。我出而不利,彼出而不利,曰支 。支形 者,敵雖利我,我無出也,引而去之,令敵半出而擊之,利 。隘形者,我先居之,必盈之以待敵。若敵先居之,盈而勿從,不盈而從之。險形者,我先居之,必居高 陽以待敵;若敵先居之,引而去之,勿從也。遠形者,勢均,難以挑戰,戰而不利。凡此六者 ,地之道也,將之至任,不可不察也。

者、有者、有者、有者、有者、有者。凡此六者,非天之災,將之過也。夫勢均,以一擊十,曰走;卒強吏弱,曰馳;吏強卒弱,曰陷;大吏怒而不服,遇敵懟 而自戰,將不知其能,曰崩;將弱不嚴,教道不明,吏卒無常,陳兵縱橫,曰亂;將不能料敵,以少合衆,以弱擊強,兵無選鋒,曰北。凡此六者,敗之道也,將之 至任,不可不察也。

夫地形者,兵之助也。料敵制勝,計險厄遠近,上將之道也。知此而用戰者必勝,不知此而用戰者必敗。故戰道必勝,主曰無戰,必戰可也;戰道不勝,主曰必戰,無戰可也。故進不求名,退不避罪 ,唯民是保,而利合於主,國之寶也。

視卒如嬰兒,故可以與之赴深溪;視卒如愛子,故可與之俱死 。厚而不能使,愛而不能令,亂而不能治,譬若驕子,不可用也。

知吾卒之可以擊,而不知敵之不可擊,勝之半也;知敵之可擊,而不知吾卒之不可以擊,勝之半也;知敵之可擊,知吾卒之可以擊,而不知地形之不可以戰,勝之半也。故知兵者,動而不迷,舉而不窮。故曰:知彼知己,勝乃不殆;知天知地,勝乃可全。

講的好。

─── 摘自《字詞網絡︰ WordNet 《六》 相 □ 而用 ○ !!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

W!o+ 的《小伶鼬工坊演義》︰神經網絡【hyper-parameters】三

對於那個七十四行 Python 小程式, Michael Nielsen 先生寫到︰

I said above that our program gets pretty good results. What does that mean? Good compared to what? It’s informative to have some simple (non-neural-network) baseline tests to compare against, to understand what it means to perform well. The simplest baseline of all, of course, is to randomly guess the digit. That’ll be right about ten percent of the time. We’re doing much better than that!

What about a less trivial baseline? Let’s try an extremely simple idea: we’ll look at how dark an image is. For instance, an image of a 2 will typically be quite a bit darker than an image of a 1, just because more pixels are blackened out, as the following examples illustrate:

This suggests using the training data to compute average darknesses for each digit, 0,1,2,,9. When presented with a new image, we compute how dark the image is, and then guess that it’s whichever digit has the closest average darkness. This is a simple procedure, and is easy to code up, so I won’t explicitly write out the code – if you’re interested it’s in the GitHub repository. But it’s a big improvement over random guessing, getting 2,225 of the 10,000 test images correct, i.e., 22.25 percent accuracy.

It’s not difficult to find other ideas which achieve accuracies in the 20 to 50 percent range. If you work a bit harder you can get up over 50 percent. But to get much higher accuracies it helps to use established machine learning algorithms. Let’s try using one of the best known algorithms, the support vector machine or SVM. If you’re not familiar with SVMs, not to worry, we’re not going to need to understand the details of how SVMs work. Instead, we’ll use a Python library called scikit-learn, which provides a simple Python interface to a fast C-based library for SVMs known as LIBSVM.

If we run scikit-learn’s SVM classifier using the default settings, then it gets 9,435 of 10,000 test images correct. (The code is available here.) That’s a big improvement over our naive approach of classifying an image based on how dark it is. Indeed, it means that the SVM is performing roughly as well as our neural networks, just a little worse. In later chapters we’ll introduce new techniques that enable us to improve our neural networks so that they perform much better than the SVM.

That’s not the end of the story, however. The 9,435 of 10,000 result is for scikit-learn’s default settings for SVMs. SVMs have a number of tunable parameters, and it’s possible to search for parameters which improve this out-of-the-box performance. I won’t explicitly do this search, but instead refer you to this blog post by Andreas Mueller if you’d like to know more. Mueller shows that with some work optimizing the SVM’s parameters it’s possible to get the performance up above 98.5 percent accuracy. In other words, a well-tuned SVM only makes an error on about one digit in 70. That’s pretty good! Can neural networks do better?

In fact, they can. At present, well-designed neural networks outperform every other technique for solving MNIST, including SVMs. The current (2013) record is classifying 9,979 of 10,000 images correctly. This was done by Li Wan, Matthew Zeiler, Sixin Zhang, Yann LeCun, and Rob Fergus. We’ll see most of the techniques they used later in the book. At that level the performance is close to human-equivalent, and is arguably better, since quite a few of the MNIST images are difficult even for humans to recognize with confidence, for example:

I trust you’ll agree that those are tough to classify! With images like these in the MNIST data set it’s remarkable that neural networks can accurately classify all but 21 of the 10,000 test images. Usually, when programming we believe that solving a complicated problem like recognizing the MNIST digits requires a sophisticated algorithm. But even the neural networks in the Wan et al paper just mentioned involve quite simple algorithms, variations on the algorithm we’ve seen in this chapter. All the complexity is learned, automatically, from the training data. In some sense, the moral of both our results and those in more sophisticated papers, is that for some problems:

sophisticated algorithm simple learning algorithm + good training data.
───

 

假使說用『猜的』,恐怕講『百分之十』都只能是『想當然爾』吧 ,故而沒什麼可以多說的了!

至於說用『平均暗度』,即使僅從人寫字習慣上講︰

HandWritingDigits

 

或大或小、或粗或細,可知其不可為!不過讀讀

"""
mnist_average_darkness
~~~~~~~~~~~~~~~~~~~~~~

A naive classifier for recognizing handwritten digits from the MNIST
data set.  The program classifies digits based on how dark they are
--- the idea is that digits like "1" tend to be less dark than digits
like "8", simply because the latter has a more complex shape.  When
shown an image the classifier returns whichever digit in the training
data had the closest average darkness.

The program works in two steps: first it trains the classifier, and
then it applies the classifier to the MNIST test data to see how many
digits are correctly classified.

Needless to say, this isn't a very good way of recognizing handwritten
digits!  Still, it's useful to show what sort of performance we get
from naive ideas."""

#### Libraries
# Standard library
from collections import defaultdict

# My libraries
import mnist_loader

def main():
    training_data, validation_data, test_data = mnist_loader.load_data()
    # training phase: compute the average darknesses for each digit,
    # based on the training data
    avgs = avg_darknesses(training_data)
    # testing phase: see how many of the test images are classified
    # correctly
    num_correct = sum(int(guess_digit(image, avgs) == digit)
                      for image, digit in zip(test_data[0], test_data[1]))
    print "Baseline classifier using average darkness of image."
    print "%s of %s values correct." % (num_correct, len(test_data[1]))

def avg_darknesses(training_data):
    """ Return a defaultdict whose keys are the digits 0 through 9.
    For each digit we compute a value which is the average darkness of
    training images containing that digit.  The darkness for any
    particular image is just the sum of the darknesses for each pixel."""
    digit_counts = defaultdict(int)
    darknesses = defaultdict(float)
    for image, digit in zip(training_data[0], training_data[1]):
        digit_counts[digit] += 1
        darknesses[digit] += sum(image)
    avgs = defaultdict(float)
    for digit, n in digit_counts.iteritems():
        avgs[digit] = darknesses[digit] / n
    return avgs

def guess_digit(image, avgs):
    """Return the digit whose average darkness in the training data is
    closest to the darkness of ``image``.  Note that ``avgs`` is
    assumed to be a defaultdict whose keys are 0...9, and whose values
    are the corresponding average darknesses across the training data."""
    darkness = sum(image)
    distances = {k: abs(v-darkness) for k, v in avgs.iteritems()}
    return min(distances, key=distances.get)

if __name__ == "__main__":
    main()


 

倒是滿有意思的。在樹莓派 3 上,實測結果如下︰

pi@raspberrypi:~/neural-networks-and-deep-learning/src python mnist_average_darkness.py  Baseline classifier using average darkness of image. 2225 of 10000 values correct.</pre>    <span style="color: #666699;">若問什麼是『Support Vector Machine』 SVM ,維基百科詞條這麼說︰</span> <h1 id="firstHeading" class="firstHeading" lang="zh-TW"><span style="color: #666699;"><a style="color: #666699;" href="https://zh.wikipedia.org/zh-tw/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA">支持向量機</a></span></h1> <span style="color: #808080;"><b>支持向量機</b>(<span class="LangWithName">英語:<span lang="en"><b>Support Vector Machine</b></span></span>,常簡稱為<b>SVM</b>)是一種<a style="color: #808080;" title="監督式學習" href="https://zh.wikipedia.org/wiki/%E7%9B%A3%E7%9D%A3%E5%BC%8F%E5%AD%B8%E7%BF%92">監督式學習</a>的方法,可廣泛地應用於<a class="mw-redirect" style="color: #808080;" title="統計分類" href="https://zh.wikipedia.org/wiki/%E7%BB%9F%E8%AE%A1%E5%88%86%E7%B1%BB">統計分類</a>以及<a class="mw-redirect" style="color: #808080;" title="回歸分析" href="https://zh.wikipedia.org/wiki/%E5%9B%9E%E5%BD%92%E5%88%86%E6%9E%90">回歸分析</a>。</span>  <span style="color: #808080;">支持向量機屬於一般化<a style="color: #808080;" title="線性分類器" href="https://zh.wikipedia.org/wiki/%E7%BA%BF%E6%80%A7%E5%88%86%E7%B1%BB%E5%99%A8">線性分類器</a>,也可以被認為是<a class="new" style="color: #808080;" title="提克洛夫規範化(頁面不存在)" href="https://zh.wikipedia.org/w/index.php?title=%E6%8F%90%E5%85%8B%E6%B4%9B%E5%A4%AB%E8%A7%84%E8%8C%83%E5%8C%96&action=edit&redlink=1">提克洛夫規範化</a>(Tikhonov Regularization)方法的一個特例。這族分類器的特點是他們能夠同時最小化經驗誤差與最大化幾何邊緣區,因此支持向量機也被稱為最大邊緣區分類器。</span> <h2><span id=".E4.BB.8B.E7.BB.8D" class="mw-headline" style="color: #808080;">介紹</span></h2> <span style="color: #808080;">支持向量機建構一個或多個<a style="color: #808080;" title="維度" href="https://zh.wikipedia.org/wiki/%E7%B6%AD%E5%BA%A6">高維</a>(甚至是無限多維)的<a style="color: #808080;" title="超平面" href="https://zh.wikipedia.org/wiki/%E8%B6%85%E5%B9%B3%E9%9D%A2">超平面</a>來<a class="mw-redirect" style="color: #808080;" title="統計分類" href="https://zh.wikipedia.org/wiki/%E7%BB%9F%E8%AE%A1%E5%88%86%E7%B1%BB">分類</a>資料點,這個超平面即為分類邊界。直觀來說,好的分類邊界要距離最近的訓練資料點越遠越好,因為這樣可以減低分類器的<a style="color: #808080;" title="泛化誤差" href="https://zh.wikipedia.org/wiki/%E6%B3%9B%E5%8C%96%E8%AF%AF%E5%B7%AE">泛化誤差</a>。在支持向量機中,分類邊界與最近的訓練資料點之間的距離稱為<b>間隔</b>(margin);支持向量機的目標即為找出間隔最大的超平面來作為分類邊界。</span>  <span style="color: #808080;">支持向量機的<b>支持向量</b>指的就是與分類邊界距離最近的訓練資料點。從支持向量機的最佳化問題可以推導出一個重要性質:支持向量機的分類邊界可由支持向量決定,而與其他資料點無關。這也是它們稱為「支持向量」的原因。</span>  <span style="color: #808080;">我們通常希望分類的過程是一個<a style="color: #808080;" title="機器學習" href="https://zh.wikipedia.org/wiki/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0">機器學習</a>的過程。這些數據點並不需要是<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/a/1/f/a1fd49f304c1094efe3fda098d5eaa5f.png" alt="\mathbb{R}^2" />中的點,而可以是任意<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/5/6/e/56ef1cb9c0683de06f05e34c0bd42537.png" alt="\mathbb{R}^p" />(統計學符號)中或者<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/3/0/c/30c28f76ef7517dbd19df4d4c683dbe6.png" alt="\mathbb{R}^n" />(計算機科學符號)的點。我們希望能夠把這些點通過一個n-1維的<a style="color: #808080;" title="超平面" href="https://zh.wikipedia.org/wiki/%E8%B6%85%E5%B9%B3%E9%9D%A2">超平面</a>分開,通常這個被稱為<a style="color: #808080;" title="線性分類器" href="https://zh.wikipedia.org/wiki/%E7%BA%BF%E6%80%A7%E5%88%86%E7%B1%BB%E5%99%A8">線性分類器</a>。有很多分類器都符合這個要求,但是我們還希望找到分類最佳的平面,即使得屬於兩個不同類的數據點間隔最大的那個面,該面亦稱為<b>最大間隔超平面</b>。如果能夠找到這個面,那麼這個分類器就稱為最大間隔分類器。</span>  <img class="alignnone size-full wp-image-52429" src="http://www.freesandal.org/wp-content/uploads/512px-Classifier.svg.png" alt="512px-Classifier.svg" width="512" height="512" />  <span style="color: #999999;">有很多個分類器(超平面)可以把數據分開,但是只有一個能夠達到最大間隔。</span> <h3><span id=".E9.97.AE.E9.A2.98.E5.AE.9A.E4.B9.89" class="mw-headline" style="color: #808080;">問題定義</span></h3> <span style="color: #808080;">我們考慮以下形式的<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/7/b/8/7b8b965ad4bca0e41ab51de7b31363a1.png" alt="n" />點測試集<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/0/e/4/0e4c3d09377b2eaa4053d184400c6616.png" alt="\mathcal{D}" />:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/c/4/e/c4e8b459adf0dce2044dc139d4a4f313.png" alt="\mathcal{D}=\{(\mathbf{x}_i,y_i)| \mathbf{x}_i \in \mathbb{R}^p, y_i \in \{-1,1\} \}_{i=1}^{n}" /></span></dd></dl><span style="color: #808080;">其中<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/1/8/d/18daef71b5d25ce76b8628a81e4fc76b.png" alt="y_i" />是<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/6/b/b/6bb61e3b7bce0931da574d19d1d82c88.png" alt="-1" />或者<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/c/4/c/c4ca4238a0b923820dcc509a6f75849b.png" alt="1" />。</span>  <span style="color: #808080;">超平面的數學形式可以寫作:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/1/2/d/12dc86d8a0deebb51533508a7f027ca3.png" alt="\mathbf{w}\cdot\mathbf{x} - b=0" />。</span></dd></dl><span style="color: #808080;">其中<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/3/c/6/3c66d9170d4c3fb75456e1a9fc6ead37.png" alt="\mathbf{x}" />是超平面上的點,<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/2/c/5/2c5a3544056eab0411512e37fedea46d.png" alt="\mathbf{w}" />是垂直於超平面的向量。</span>  <span style="color: #808080;">根據幾何知識,我們知道<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/2/c/5/2c5a3544056eab0411512e37fedea46d.png" alt="\mathbf{w}" />向量垂直於分類超平面。加入位移<b>b</b>的目的是增加間隔。如果沒有<i>b</i>的話,那超平面將不得不通過原點,限制了這個方法的靈活性。</span>  <span style="color: #808080;">由於我們要求最大間隔,因此我們需要知道支持向量以及(與最佳超平面)平行的並且離支持向量最近的超平面。我們可以看到這些平行超平面可以由方程族:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/d/2/9/d29858140149355ee18a76f035dec13a.png" alt="\mathbf{w}\cdot\mathbf{x} - b=1" /></span></dd></dl><span style="color: #808080;">或是</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/1/5/b/15b2c2325ec495f37069bd03a137fa1f.png" alt="\mathbf{w}\cdot\mathbf{x} - b=-1" /></span></dd></dl><span style="color: #808080;">來表示,由於<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/2/c/5/2c5a3544056eab0411512e37fedea46d.png" alt="\mathbf{w}" />只是超平面的法向量,長度未定,是一個變量,所以等式右邊的1和-1隻是為計算方便而取的常量,其他常量只要互為相反數亦可。</span>  <span style="color: #808080;">如果這些訓練數據是線性可分的,那就可以找到這樣兩個超平面,在它們之間沒有任何樣本點並且這兩個超平面之間的距離也最大。通過幾何不難得到這兩個超平面之間的距離是2/|<i><b>w</b></i>|,因此我們需要最小化 |<i><b>w</b></i>|。同時為了使得樣本數據點都在超平面的間隔區以外,我們需要保證對於所有的<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/8/6/5/865c0c0b4ab0e063e5caa3387c1a8741.png" alt="i" />滿足其中的一個條件:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/2/b/b/2bba01aaa6db12e99e73287bebf2cad9.png" alt="\mathbf{w}\cdot\mathbf{x_i} - b \ge 1\qquad\mathrm{}" /></span></dd></dl><span style="color: #808080;">或是</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/f/f/6/ff60a9e0368a9230d64b4a0bd593d038.png" alt="\mathbf{w}\cdot\mathbf{x_i} - b \le -1\qquad\mathrm{}" /></span></dd></dl><span style="color: #808080;">這兩個式子可以寫作:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/b/5/9/b593983ab96553a2b4ad3219e3107fe0.png" alt="y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1, \quad 1 \le i \le n.\qquad\qquad (1)" /></span></dd></dl> <h3><img class="alignnone size-full wp-image-52432" src="http://www.freesandal.org/wp-content/uploads/SVM_margins.png" alt="SVM_margins" width="740" height="903" /></h3> <span style="color: #999999;">設樣本屬於兩個類,用該樣本訓練svm得到的最大間隔超平面。在超平面上的樣本點也稱為支持向量。</span> <h3><span id=".E5.8E.9F.E5.9E.8B" class="mw-headline" style="color: #808080;">原型</span></h3> <span style="color: #808080;">現在尋找最佳超平面這個問題就變成了在(1)這個約束條件下最小化|<i><b>w</b></i>|.這是一個<a style="color: #808080;" title="二次規劃" href="https://zh.wikipedia.org/wiki/%E4%BA%8C%E6%AC%A1%E8%A7%84%E5%88%92">二次規劃</a>(QPquadratic programming)<a style="color: #808080;" title="最優化" href="https://zh.wikipedia.org/wiki/%E6%9C%80%E4%BC%98%E5%8C%96">最優化</a>中的問題。</span>  <span style="color: #808080;">更清楚的表示:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/3/7/9376db80f8e3c1044354b455431b71ae.png" alt="\arg\min_{\mathbf{w},b}{||\mathbf{w}||^2\over2}" />,滿足<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/1/6/b/16bf786b49cd616f713b9414a50b40e5.png" alt="y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1" /></span></dd></dl><span style="color: #808080;">其中<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/4/d/e/4de0f8ba6b80f41fe9152789af172da1.png" alt="i = 1, \dots, n" />。 1/2因子是為了數學上表達的方便加上的。</span>  <span style="color: #808080;">解如上約束問題,通常的想法可能是使用非負<a style="color: #808080;" title="拉格朗日乘數" href="https://zh.wikipedia.org/wiki/%E6%8B%89%E6%A0%BC%E6%9C%97%E6%97%A5%E4%B9%98%E6%95%B0">拉格朗日乘數</a><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/0/d/90db017b80d63780533fbc74fb227dba.png" alt="\alpha_i" />於下式:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/4/9/a/49a160099f1dd1f4073081e88ef4fd0c.png" alt="\arg\min_{\mathbf{w},b } \max_{\boldsymbol{\alpha}\geq 0 } \left\{ \frac{1}{2}\|\mathbf{w}\|^2 - \sum_{i=1}^{n}{\alpha_i[y_i(\mathbf{w}\cdot \mathbf{x_i} - b)-1]} \right\}" /></span></dd></dl><span style="color: #808080;">此式表明我們尋找一個鞍點。這樣所有可以被<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/a/b/8/ab8f0a523372887b3ba3db53ff310f62.png" alt="y_i(\mathbf{w}\cdot\mathbf{x_i} - b) - 1 > 0 " />分離的點就無關緊要了,因為我們必須設置相應的<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/0/d/90db017b80d63780533fbc74fb227dba.png" alt="\alpha_i" />為零。</span>  <span style="color: #808080;">這個問題現在可以用標準二次規劃技術標準和程序解決。結論可以表示為如下訓練向量的線性組合</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/3/e/2/3e26ffdf1194a3a374c6d0f83009fad7.png" alt="\mathbf{w} = \sum_{i=1}^n{\alpha_i y_i\mathbf{x_i}}" /></span></dd></dl><span style="color: #808080;">其中只有很少的<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/0/d/90db017b80d63780533fbc74fb227dba.png" alt="\alpha_i" />會大於0.對應的<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/7/f/f/7ffdb7137a8c04a8ad301b25500448f3.png" alt="\mathbf{x_i}" />就是<b>支持向量</b>,這些支持向量在邊緣上並且滿足<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/c/0/1/c019d0fbcb95223407e72453934f2e89.png" alt="y_i(\mathbf{w}\cdot\mathbf{x_i} - b) = 1" />.由此可以推導出支持向量也滿足:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/5/f/4/5f481046a6c4d93568aa3b4db5fb80d8.png" alt="\mathbf{w}\cdot\mathbf{x_i} - b = 1 / y_i = y_i \iff b = \mathbf{w}\cdot\mathbf{x_i} - y_i" /></span></dd></dl><span style="color: #808080;">因此允許定義偏移量<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/2/e/92eb5ffee6ae2fec3ad71c777531578f.png" alt="b" />.在實際應用中,把所有支持向量<img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/e/2/e/e2e05e7b7f6f05a74288eb23a1e5cd46.png" alt="N_{SV}" />的偏移量做平均後魯棒性更強:</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/6/1/a/61a560e9c604036e390332e81461c21a.png" alt="b = \frac{1}{N_{SV}} \sum_{i=1}^{N_{SV}}{(\mathbf{w}\cdot\mathbf{x_i} - y_i)}" />。</span></dd></dl><span style="color: #808080;">───</span>     <span style="color: #666699;">細心的讀者當可發現它與『感知器網絡』密切的淵源︰</span> <h1 id="firstHeading" class="firstHeading" lang="en"><span style="color: #666699;"><a style="color: #666699;" href="https://en.wikipedia.org/wiki/Perceptron">Perceptron</a></span></h1> <h2><span id="Definition" class="mw-headline" style="color: #808080;">Definition</span></h2> <span style="color: #808080;">In the modern sense, the perceptron is an algorithm for learning a binary classifier: a function that maps its input <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/d/d/9dd4e461268c8034f5c8564e155c67a6.png" alt="x" /> (a real-valued <a style="color: #808080;" title="Vector space" href="https://en.wikipedia.org/wiki/Vector_space">vector</a>) to an output value <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/5/0/b/50bbd36e1fd2333108437a2ca378be62.png" alt="f(x)" /> (a single <a style="color: #808080;" title="Binary function" href="https://en.wikipedia.org/wiki/Binary_function">binary</a> value):</span>  <dl><dd><span style="color: #808080;"><img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/c/4/1/c417e0191fa26561f6947ce57c182617.png" alt=" f(x) = \begin{cases}1 & \text{if }w \cdot x + b > 0\\0 & \text{otherwise}\end{cases} " /></span></dd></dl><span style="color: #808080;">where <span class="texhtml mvar">w</span> is a vector of real-valued weights, <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/2/b/6/2b6f207a3abc53dc275356f5b7f67d12.png" alt="w \cdot x" /> is the <a style="color: #808080;" title="Dot product" href="https://en.wikipedia.org/wiki/Dot_product">dot product</a> <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/e/a/4/ea431f325546eea5fbb889946ed8641a.png" alt="\sum_{i=0}^m w_i x_i" />, where m is the number of inputs to the perceptron and <span class="texhtml mvar">b</span> is the <i>bias</i>. The bias shifts the decision boundary away from the origin and does not depend on any input value.</span>  <span style="color: #808080;">The value of <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/5/0/b/50bbd36e1fd2333108437a2ca378be62.png" alt="f(x)" /> (0 or 1) is used to classify <span class="texhtml mvar">x</span> as either a positive or a negative instance, in the case of a binary classification problem. If <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/9/2/e/92eb5ffee6ae2fec3ad71c777531578f.png" alt="b" /> is negative, then the weighted combination of inputs must produce a positive value greater than <img class="mwe-math-fallback-image-inline tex" src="https://upload.wikimedia.org/math/2/5/5/255aa5ce12bea936f4447c696a34332b.png" alt="|b|" /> in order to push the classifier neuron over the 0 threshold. Spatially, the bias alters the position (though not the orientation) of the <a style="color: #808080;" title="Decision boundary" href="https://en.wikipedia.org/wiki/Decision_boundary">decision boundary</a>. The perceptron learning algorithm does not terminate if the learning set is not <a class="mw-redirect" style="color: #808080;" title="Linearly separable" href="https://en.wikipedia.org/wiki/Linearly_separable">linearly separable</a>. If the vectors are not linearly separable learning will never reach a point where all vectors are classified properly. The most famous example of the perceptron's inability to solve problems with linearly nonseparable vectors is the Boolean exclusive-or problem. The solution spaces of decision boundaries for all binary functions and learning behaviors are studied in the reference.<sup id="cite_ref-7" class="reference"><a style="color: #808080;" href="https://en.wikipedia.org/wiki/Perceptron#cite_note-7">[7]</a></sup></span>  <span style="color: #808080;">In the context of neural networks, a perceptron is an <a style="color: #808080;" title="Artificial neuron" href="https://en.wikipedia.org/wiki/Artificial_neuron">artificial neuron</a> using the <a style="color: #808080;" title="Heaviside step function" href="https://en.wikipedia.org/wiki/Heaviside_step_function">Heaviside step function</a> as the activation function. The perceptron algorithm is also termed the <b>single-layer perceptron</b>, to distinguish it from a <a style="color: #808080;" title="Multilayer perceptron" href="https://en.wikipedia.org/wiki/Multilayer_perceptron">multilayer perceptron</a>, which is a misnomer for a more complicated neural network. As a linear classifier, the single-layer perceptron is the simplest <a style="color: #808080;" title="Feedforward neural network" href="https://en.wikipedia.org/wiki/Feedforward_neural_network">feedforward neural network</a>.</span> <h2><span id="Learning_algorithm" class="mw-headline" style="color: #808080;">Learning algorithm</span></h2> <span style="color: #808080;">Below is an example of a learning algorithm for a (single-layer) perceptron. For <a style="color: #808080;" title="Multilayer perceptron" href="https://en.wikipedia.org/wiki/Multilayer_perceptron">multilayer perceptrons</a>, where a hidden layer exists, more sophisticated algorithms such as <a style="color: #808080;" title="Backpropagation" href="https://en.wikipedia.org/wiki/Backpropagation">backpropagation</a> must be used. Alternatively, methods such as the <a style="color: #808080;" title="Delta rule" href="https://en.wikipedia.org/wiki/Delta_rule">delta rule</a> can be used if the function is non-linear and differentiable, although the one below will work as well.</span>  <span style="color: #808080;">When multiple perceptrons are combined in an artificial neural network, each output neuron operates independently of all the others; thus, learning each output can be considered in isolation.</span>  <img class="alignnone size-full wp-image-51643" src="http://www.freesandal.org/wp-content/uploads/Perceptron_example.svg.png" alt="Perceptron_example.svg" width="500" height="500" />  <span style="color: #808080;">A diagram showing a perceptron updating its linear boundary as more training examples are added.</span>  ─── 摘自《<a href="http://www.freesandal.org/?m=20160413">W!o+ 的《小伶鼬工坊演義》︰神經網絡【Perceptron】七</a>》     <span style="color: #666699;">因此見到『預設』之『辨識率』能達9,435 / 10,000大概不會大驚小怪乎???</span>  <span style="color: #808080;"><strong>【scikits 安裝】</strong></span>  <span style="color: #808080;">sudo apt-get install python-scikits-learn</span>     <span style="color: #808080;"><strong>【mnist_svm.py】</strong></span> <pre class="lang:python decode:true ">""" mnist_svm ~~~~~~~~~  A classifier program for recognizing handwritten digits from the MNIST data set, using an SVM classifier."""  #### Libraries # My libraries import mnist_loader  # Third-party libraries from sklearn import svm  def svm_baseline():     training_data, validation_data, test_data = mnist_loader.load_data()     # train     clf = svm.SVC()     clf.fit(training_data[0], training_data[1])     # test     predictions = [int(a) for a in clf.predict(test_data[0])]     num_correct = sum(int(a == y) for a, y in zip(predictions, test_data[1]))     print "Baseline classifier using an SVM."     print "%s of %s values correct." % (num_correct, len(test_data[1]))  if __name__ == "__main__":     svm_baseline() </pre>    <span style="color: #808080;"><strong>【實測結果】</strong></span> <pre class="lang:python decode:true ">pi@raspberrypi:~/neural-networks-and-deep-learning/src python mnist_svm.py 
Baseline classifier using an SVM.
9435 of 10000 values correct.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

W!o+ 的《小伶鼬工坊演義》︰神經網絡【hyper-parameters】二

隨著 Michael Nielsen 先生介紹『mnist_loader.py』程式,第一章也將步入尾聲。正好借此機緣,說說如何用 Python 『struct』程式庫

7.3. struct — Interpret strings as packed binary data

This module performs conversions between Python values and C structs represented as Python strings. This can be used in handling binary data stored in files or from network connections, among other sources. It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values.

Note

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment: see Byte Order, Size, and Alignment for details.

……

7.3.2.1. Byte Order, Size, and Alignment

By default, C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).

Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data, according to the following table:

Character Byte order Size Alignment
@ native native native
= native standard none
< little-endian standard none
> big-endian standard none
! network (= big-endian) standard none

If the first character is not one of these, '@' is assumed.

……

7.3.2.2. Format Characters

Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='. When using native size, the size of the packed value is platform-dependent.

Format C Type Python type Standard size Notes
x pad byte no value    
c char string of length 1 1  
b signed char integer 1 (3)
B unsigned char integer 1 (3)
? _Bool bool 1 (1)
h short integer 2 (3)
H unsigned short integer 2 (3)
i int integer 4 (3)
I unsigned int integer 4 (3)
l long integer 4 (3)
L unsigned long integer 4 (3)
q long long integer 8 (2), (3)
Q unsigned long long integer 8 (2), (3)
f float float 4 (4)
d double float 8 (4)
s char[] string    
p char[] string    
P void * integer   (5), (3)

───

 

讀取原始『MNIST』之手寫阿拉伯數字資料庫︰

FILE FORMATS FOR THE MNIST DATABASE

TRAINING SET LABEL FILE (train-labels-idx1-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000801(2049) magic number (MSB first)
0004     32 bit integer  60000            number of items
0008     unsigned byte   ??               label
0009     unsigned byte   ??               label
........
xxxx     unsigned byte   ??               label

The labels values are 0 to 9.

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  60000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

TEST SET LABEL FILE (t10k-labels-idx1-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000801(2049) magic number (MSB first)
0004     32 bit integer  10000            number of items
0008     unsigned byte   ??               label
0009     unsigned byte   ??               label
........
xxxx     unsigned byte   ??               label

The labels values are 0 to 9.

TEST SET IMAGE FILE (t10k-images-idx3-ubyte):

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  10000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel
0017     unsigned byte   ??               pixel
........
xxxx     unsigned byte   ??               pixel

Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

───

 

希望短短的幾行互動程式足以盡其意也︰

>>> import struct
>>> import numpy as np

>>> with open("train-images.idx3-ubyte","rb") as imagefile:
...     magic, ni, nr, nc = struct.unpack(">IIII", imagefile.read(16))
...     images = np.fromfile(imagefile, dtype=np.uint8).reshape(60000,784)
... 
>>> len(images)
60000
>>> type(images[0])
<type 'numpy.ndarray'>
>>> len((images[0]))
784

>>> with open("train-labels.idx1-ubyte", "rb") as labelfile:
...     magic, ni =  struct.unpack(">II", labelfile.read(8))
...     labels = np.fromfile(labelfile, dtype=np.uint8)
... 
>>> len(labels)
60000
>>> labels[0]
5

>>> import matplotlib.pyplot as plt
>>> img = images[0].reshape(28,28)
>>> plt.imshow(img,cmap='Greys', interpolation='nearest')
<matplotlib.image.AxesImage object at 0x30f7b10>
>>> plt.show()
>>> 

 

Figure 5