W!o+ 的《小伶鼬工坊演義》︰神經網絡與深度學習【發凡】

對一本小巧完整而且寫的好的書,又該多說些什麼的呢?於是幾經思慮,就講講過去之閱讀隨筆與念頭雜記的吧!終究面對一個既舊也新的議題,尚待火石電光激發創意和發想。也許只需一個洞見或將改變人工智慧的未來乎??

Michael Nielsen 先生開宗明義在首章起頭

The human visual system is one of the wonders of the world. Consider the following sequence of handwritten digits:

Most people effortlessly recognize those digits as 504192. That ease is deceptive. In each hemisphere of our brain, humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. And yet human vision involves not just V1, but an entire series of visual cortices – V2, V3, V4, and V5 – doing progressively more complex image processing. We carry in our heads a supercomputer, tuned by evolution over hundreds of millions of years, and superbly adapted to understand the visual world. Recognizing handwritten digits isn’t easy. Rather, we humans are stupendously, astoundingly good at making sense of what our eyes show us. But nearly all that work is done unconsciously . And so we don’t usually appreciate how tough a problem our visual systems solve.

The difficulty of visual pattern recognition becomes apparent if you attempt to write a computer program to recognize digits like those above. What seems easy when we do it ourselves suddenly becomes extremely difficult. Simple intuitions about how we recognize shapes – “a 9 has a loop at the top, and a vertical stroke in the bottom right” – turn out to be not so simple to express algorithmically. When you try to make such rules precise, you quickly get lost in a morass of exceptions and caveats and special cases. It seems hopeless.

Neural networks approach the problem in a different way. The idea is to take a large number of handwritten digits, known as training examples,

 

and then develop a system which can learn from those training examples. In other words, the neural network uses the examples to automatically infer rules for recognizing handwritten digits. Furthermore, by increasing the number of training examples, the network can learn more about handwriting, and so improve its accuracy. So while I’ve shown just 100 training digits above, perhaps we could build a better handwriting recognizer by using thousands or even millions or billions of training examples.

In this chapter we’ll write a computer program implementing a neural network that learns to recognize handwritten digits. The program is just 74 lines long, and uses no special neural network libraries. But this short program can recognize digits with an accuracy over 96 percent, without human intervention. Furthermore, in later chapters we’ll develop ideas which can improve accuracy to over 99 percent. In fact, the best commercial neural networks are now so good that they are used by banks to process cheques, and by post offices to recognize addresses.

We’re focusing on handwriting recognition because it’s an excellent prototype problem for learning about neural networks in general. As a prototype it hits a sweet spot: it’s challenging – it’s no small feat to recognize handwritten digits – but it’s not so difficult as to require an extremely complicated solution, or tremendous computational power. Furthermore, it’s a great way to develop more advanced techniques, such as deep learning. And so throughout the book we’ll return repeatedly to the problem of handwriting recognition. Later in the book, we’ll discuss how these ideas may be applied to other problems in computer vision, and also in speech, natural language processing, and other domains.

Of course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we’ll develop many key ideas about neural networks, including two important types of artificial neuron (the perceptron and the sigmoid neuron), and the standard learning algorithm for neural networks, known as stochastic gradient descent. Throughout, I focus on explaining why things are done the way they are, and on building your neural networks intuition. That requires a lengthier discussion than if I just presented the basic mechanics of what’s going on, but it’s worth it for the deeper understanding you’ll attain. Amongst the payoffs, by the end of the chapter we’ll be in position to understand what deep learning is, and why it matters.

………

 

說明這本書的主旨。是用『手寫阿拉伯數字辨識』這一主題貫串『神經網絡』以及『深度學習』之點滴,希望讀者能夠藉著最少的文本一窺全豹、聞一知十。因此他盡量少用『數學』,盡可能白話描述重要的『原理』與『概念』。通常不以『機器學習

機器學習是近20多年興起的一門多領域交叉學科,涉及機率論統計學逼近論凸分析計算複雜性理論等多門學科。機器學習理論主要是設計和分析一些讓計算機可以自動「學習」的算法。機器學習算法是一類從數據中自動分析獲得規律,並利用規律對未知數據進行預測的算法。因為學習算法中涉及了大量的統計學理論,機器學習與推斷統計學聯繫尤為密切,也被稱為統計學習理論。算法設計方面,機器學習理論關注可以實現的,行之有效的學習算法。很多推論問題屬於無程序可循難度,所以部分的機器學習研究是開發容易處理的近似算法。

機器學習已廣泛應用於數據挖掘計算機視覺自然語言處理生物特徵識別搜尋引擎醫學診斷、檢測信用卡欺詐證券市場分析、DNA序列測序、語音手寫識別、戰略遊戲機器人等領域。

定義

機器學習有下面幾種定義:

  • 機器學習是一門人工智慧的科學,該領域的主要研究對象是人工智慧,特別是如何在經驗學習中改善具體算法的性能。
  • 機器學習是對能通過經驗自動改進的計算機算法的研究。
  • 機器學習是用數據或以往的經驗,以此優化電腦程式的性能標準。

一種經常引用的英文定義是:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

分類

機器學習可以分成下面幾種類別:

  • 監督學習從給定的訓練數據集中學習出一個函數,當新的數據到來時,可以根據這個函數預測結果。監督學習的訓練集要求是包括輸入和輸出,也可以說是特徵和目標。訓練集中的目標是由人標註的。常見的監督學習算法包括回歸分析統計分類
  • 無監督學習與監督學習相比,訓練集沒有人為標註的結果。常見的無監督學習算法有聚類
  • 半監督學習介於監督學習與無監督學習之間。
  • 增強學習通過觀察來學習做成如何的動作。每個動作都會對環境有所影響,學習對象根據觀察到的周圍環境的反饋來做出判斷。

───

 

種種術語打攪讀者。固然這對初次接觸者有很大的好處,然而這對『人工智慧』之叢林而言恐怕稍嫌不足。就像那個帶註解的程式

network.py

tikz12

 

"""
network.py
~~~~~~~~~~

A module to implement the stochastic gradient descent learning
algorithm for a feedforward neural network.  Gradients are calculated
using backpropagation.  Note that I have focused on making the code
simple, easily readable, and easily modifiable.  It is not optimized,
and omits many desirable features.
"""

#### Libraries
# Standard library
import random

# Third-party libraries
import numpy as np

class Network(object):

    def __init__(self, sizes):
        """The list ``sizes`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, a):
        """Return the output of the network if ``a`` is input."""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        """Train the neural network using mini-batch stochastic
        gradient descent.  The ``training_data`` is a list of tuples
        ``(x, y)`` representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If ``test_data`` is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""
        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print "Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test)
            else:
                print "Epoch {0} complete".format(j)

    def update_mini_batch(self, mini_batch, eta):
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
        is the learning rate."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb
                       for b, nb in zip(self.biases, nabla_b)]

    def backprop(self, x, y):
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        # Note that the variable l in the loop below is used a little
        # differently to the notation in Chapter 2 of the book.  Here,
        # l = 1 means the last layer of neurons, l = 2 is the
        # second-last layer, and so on.  It's a renumbering of the
        # scheme in the book, used here to take advantage of the fact
        # that Python can use negative indices in lists.
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        """Return the number of test inputs for which the neural
        network outputs the correct result. Note that the neural
        network's output is assumed to be the index of whichever
        neuron in the final layer has the highest activation."""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y)

#### Miscellaneous functions
def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

 

是屬於

監督式學習

監督式學習英語:Supervised learning),是一個機器學習中的方法,可以由訓練資料中學到或建立一個模式(函數 / learning model),並依此模式推測新的實例。訓練資料是由輸入物件(通常是向量)和預期輸出所組成。函數的輸出可以是一個連續的值(稱為迴歸分析),或是預測一個分類標籤(稱作分類)。

一個監督式學習者的任務在觀察完一些訓練範例(輸入和預期輸出)後,去預測這個函數對任何可能出現的輸入的值的輸出。要達到此目的,學習者必須以”合理”(見歸納偏向)的方式從現有的資料中一般化到非觀察到的情況。在人類和動物感知中,則通常被稱為概念學習(concept learning)。

回顧

監督式學習有兩種形態的模型。最一般的,監督式學習產生一個全域模型,會將輸入物件對應到預期輸出。而另一種,則是將這種對應實作在一個區域模型。(如案例推論最近鄰居法)。為了解決一個給定的監督式學習的問題(手寫辨識),必須考慮以下步驟:

  1. 決定訓練資料的範例的形態。在做其它事前,工程師應決定要使用哪種資料為範例。譬如,可能是一個手寫字符,或一整個手寫的辭彙,或一行手寫文字。
  2. 搜集訓練資料。這資料須要具有真實世界的特徵。所以,可以由人類專家或(機器或感測器的)測量中得到輸入物件和其相對應輸出。
  3. 決定學習函數的輸入特徵的表示法。學習函數的準確度與輸入的物件如何表示是有很大的關聯度。傳統上,輸入的物件會被轉成一個特徵向量,包含了許多關於描述物件的特徵。因為維數災難的關係,特徵的個數不宜太多,但也要足夠大,才能準確的預測輸出。
  4. 決定要學習的函數和其對應的學習演算法所使用的資料結構。譬如,工程師可能選擇人工神經網路決策樹
  5. 完成設計。工程師接著在搜集到的資料上跑學習演算法。可以藉由將資料跑在資料的子集(稱為驗證集)或交叉驗證(cross-validation)上來調整學習演算法的參數。參數調整後,演算法可以運行在不同於訓練集的測試集上

另外對於監督式學習所使用的辭彙則是分類。現著有著各式的分類器,各自都有強項或弱項。分類器的表現很大程度上地跟要被分類的資料特性有關。並沒有 某一單一分類器可以在所有給定的問題上都表現最好,這被稱為『天下沒有白吃的午餐理論』。各式的經驗法則被用來比較分類器的表現及尋找會決定分類器表現的 資料特性。決定適合某一問題的分類器仍舊是一項藝術,而非科學。

目前最廣泛被使用的分類器有人工神經網路支持向量機最近鄰居法高斯混合模型樸素貝葉斯方法決策樹徑向基函數分類

這一類範疇,事實上那裡曾經提出過多種『機器學習』之方法!!誰知道未來誰奪魁的耶??