《題李凝幽居》唐‧賈島
閑居少鄰並,草徑入荒園。
鳥宿池邊樹,僧敲月下門。
過橋分野色,移石動雲根。
暫去還來此,幽期不負言。
元和六年(811),謁韓愈,以詩深得賞識。賈島是著名的苦吟派詩人,著名的典故“推敲”即出自他處。傳說賈島在長安跨驢背吟“鳥宿池邊樹,僧敲月下門”,鍊“推” 、“敲”字不決,後世乃以斟酌文字爲“推敲”。賈島在韓門時,與張籍、孟郊、馬戴 、姚合往來酬唱甚密。他擅長五律, 苦吟成癖。其詩造語奇特,給人印象深刻,常寫荒寒冷落之景,表現愁苦幽獨之情。如“獨行潭底影 ,數息樹邊身”,“歸吏封宵鑰,行蛇入古桐”等句。這類慘淡經營的詩句,構成他奇僻清峭的風格,給人以枯寂陰黯之感。也有於幽獨中表現清美意境的詩和語言 質樸自然、感情純真率直、風格豪爽雄健的詩。
傳說賈島專心『推』、『敲』之際,不知直闖韓愈轎前。因而得受『一』字之教 ─── 一字師 ───。韓愈以為『鳥宿池邊樹』入夜也,直『推』門恐有唐突之嫌,不如先『敲』門矣。
文字推敲,語詞鍛鍊,意境組合排列之嘗試,此賈島『苦吟』之法 。實通『天下第一法』 ── 試誤法 ───
Trial and error
Trial and error is a fundamental method of solving problems.[1] It is characterised by repeated, varied attempts which are continued until success,[2] or until the agent stops trying.
According to W.H. Thorpe, the term was devised by C. Lloyd Morgan after trying out similar phrases “trial and failure” and “trial and practice”.[3] Under Morgan’s Canon, animal behaviour should be explained in the simplest possible way. Where behaviour seems to imply higher mental processes, it might be explained by trial-and-error learning. An example is the skillful way in which his terrier Tony opened the garden gate, easily misunderstood as an insightful act by someone seeing the final behaviour. Lloyd Morgan, however, had watched and recorded the series of approximations by which the dog had gradually learned the response, and could demonstrate that no insight was required to explain it.
Edward Thorndike showed how to manage a trial-and-error experiment in the laboratory. In his famous experiment, a cat was placed in a series of puzzle boxes in order to study the law of effect in learning.[4] He plotted learning curves which recorded the timing for each trial. Thorndike’s key observation was that learning was promoted by positive results, which was later refined and extended by B.F. Skinner‘s operant conditioning.
Trial and error is also a heuristic method of problem solving, repair, tuning, or obtaining knowledge. In the field of computer science, the method is called generate and test. In elementary algebra, when solving equations, it is “guess and check”.
This approach can be seen as one of the two basic approaches to problem solving, contrasted with an approach using insight and theory. However, there are intermediate methods which for example, use theory to guide the method, an approach known as guided empiricism.
Trial with PC
的門徑。且聽 Michael Nielsen 先生弘揚『試誤法』之精神︰
Using rectified linear units: The network we’ve developed at this point is actually a variant of one of the networks used in the seminal 1998 paper*
*“Gradient-based learning applied to document recognition”, by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner (1998). There are many differences of detail, but broadly speaking our network is quite similar to the networks described in the paper.
introducing the MNIST problem, a network known as LeNet-5. It’s a good foundation for further experimentation, and for building up understanding and intuition. In particular, there are many ways we can vary the network in an attempt to improve our results.
As a beginning, let’s change our neurons so that instead of using a sigmoid activation function, we use rectified linear units. That is, we’ll use the activation function . We’ll train for epochs, with a learning rate of . I also found that it helps a little to use some l2 regularization, with regularization parameter :
>>> from network3 import ReLU >>> net = Network([ ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28), filter_shape=(20, 1, 5, 5), poolsize=(2, 2), activation_fn=ReLU), ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12), filter_shape=(40, 20, 5, 5), poolsize=(2, 2), activation_fn=ReLU), FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=ReLU), SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size) >>> net.SGD(training_data, 60, mini_batch_size, 0.03, validation_data, test_data, lmbda=0.1)
I obtained a classification accuracy of percent. It’s a modest improvement over the sigmoid results (). However, across all my experiments I found that networks based on rectified linear units consistently outperformed networks based on sigmoid activation functions. There appears to be a real gain in moving to rectified linear units for this problem.
What makes the rectified linear activation function better than the sigmoid or tanh functions? At present, we have a poor understanding of the answer to this question. Indeed, rectified linear units have only begun to be widely used in the past few years. The reason for that recent adoption is empirical: a few people tried rectified linear units, often on the basis of hunches or heuristic arguments*
*A common justification is that doesn’t saturate in the limit of large , unlike sigmoid neurons, and this helps rectified linear units continue learning. The argument is fine, as far it goes, but it’s hardly a detailed justification, more of a just-so story. Note that we discussed the problems with saturation back in Chapter 2..
They got good results classifying benchmark data sets, and the practice has spread. In an ideal world we’d have a theory telling us which activation function to pick for which application. But at present we’re a long way from such a world. I should not be at all surprised if further major improvements can be obtained by an even better choice of activation function. And I also expect that in coming decades a powerful theory of activation functions will be developed. Today, we still have to rely on poorly understood rules of thumb and experience.
Expanding the training data: Another way we may hope to improve our results is by algorithmically expanding the training data. A simple way of expanding the training data is to displace each training image by a single pixel, either up one pixel, down one pixel, left one pixel, or right one pixel. We can do this by running the program expand_mnist.py from the shell prompt*
*The code for expand_mnist.py is available here.:
50,000250,00056099.3799.610099.63001,00099.4699.4399.3710099.433001,00099.4899.4799.6010099.37401,0001003001,0001,00099.6599.673310,000656053969,967$ images which aren't shown. In that context, the few clear errors here seem quite understandable. Even a careful human makes the occasional mistake. And so I expect that only an extremely careful and methodical human would do much better. Our network is getting near to human performance.───