W!o+ 的《小伶鼬工坊演義》︰神經網絡【超參數評估】二

人說孔雀東南飛

孔雀東南飛

漢末建安中,廬江府小吏焦仲卿妻劉氏,為仲卿母斫遣,自誓不嫁。其家逼之, 乃沒水而死。仲卿聞之,亦自縊于庭樹。時人傷之,為詩云爾。

孔雀東南飛,五里一徘徊。十三能織素,十四學裁衣,十五彈箜篌 ,十六誦詩書,十七為君婦,心中常悲苦。君既為府吏,守節情不移。賤妾留空房,相見長日稀。雞鳴入機織,夜夜不得息,三日斷五疋,大人故嫌遲。非為織作遲,君家婦難為。妾不堪驅使,徒留無所施。便可白公姥,及時相遣歸。府吏得聞之,堂上啟阿母:兒已薄祿相,幸復得此婦。結髮共枕席,黃泉共為友,共事二三年,始爾未為久。女行無偏斜,何意致不厚?阿母謂府吏,何乃太區區 !此婦無禮節,舉動自專由。吾意久懷忿,汝豈得自由。東家有賢女,自名秦羅敷。可憐體無比,阿母為汝求,便可速遣之,遣去慎莫留。府吏長跪告,伏惟啟阿母。今若遣此婦,終老不復取。阿母得聞之,槌床便大怒:小子無所畏,何敢助婦語。吾已失恩義,會不相從許。府吏默無聲,再拜還入戶。舉言謂新婦,哽咽不能語。我自不驅卿,逼迫有阿母。卿但暫還家,吾今且報府。不久當歸還 ,還必相迎取。以此下心意,慎勿違吾語。新婦謂府吏,勿復重紛紜。往昔初陽歲,謝家來貴門。奉事循公姥,進止敢自專,晝夜勤作息,伶娉縈苦辛。謂言無罪過,供養卒大恩。仍更被驅遣,何言復來還。妾有繡腰襦,葳蕤自生光。紅羅複斗帳,四角垂香囊。箱簾六七十,綠碧青絲繩。物物各自異,種種在其中。人賤物亦鄙,不足迎後人。留待作遣施,於今無會因,時時為安慰,久久莫相忘 。雞鳴外欲曙,新婦起嚴妝,著我繡裌裙,事事四五通,足下躡絲履,頭上玳瑁光。腰若流紈素,耳著明月璫。指如削蔥根,口如含朱丹,纖纖作細步,精妙世無雙。上堂謝阿母,母聽去不止。昔作女兒時,生小出野里,本自無教訓,兼愧貴家子。受母錢帛多,不堪母驅使,今日還家去,令母勞家裡。卻與小姑別,淚落連珠子。新婦初來時,小姑始扶床,今日被驅遣,小姑如我長,勤心養公姥 ,好自相扶將。初七及下九,嬉戲莫相忘。出門登車去,涕落百餘行。府吏馬在前,新婦車在後。隱隱何田田,俱會大道口。下馬入車中,低頭共耳語。誓不相隔卿,且暫還家去。吾今且赴府。不久當還歸,誓天不相負。新婦謂府吏:感君區區懷,君既若見錄,不久望君來。君當作磐石,妾當作蒲葦,蒲葦紉如絲,磐石無轉移。我有親父兄,性行暴如雷。恐不任我意,逆以煎我懷。舉手長勞勞 ,二情同依依。入門上家堂,進退無顏儀。十七遣汝嫁,謂言無誓違。汝今無罪過,不迎而自歸。蘭芝慚阿母:兒實無罪過。阿母大悲摧。還家十餘日,縣令遣媒來。云有第三郎,窈窕世無雙。年始十八九,便言多令才。阿母謂阿女:汝可去應之。阿女銜淚答:蘭芝初還時,府吏見丁寧,結誓不別離。今日違情義,恐此事非奇。
自可斷來信,徐徐更謂之。阿母白媒人,貧賤有此女,始適還家門 。不堪吏人婦,豈合令郎君。幸可廣問訊,不得便相許。媒人去數日,尋遣丞請還。誰有蘭家女,丞籍有宦官。云有第五郎,嬌逸未有婚。遣丞為媒人,主簿通語言。直說太守家,有此令郎君。既欲結大義,故遣來貴門。阿母謝媒人:女子先有誓,老姥豈敢言。阿兄得聞之,悵然心中煩。舉言謂阿妹:作計何不量?先嫁得府吏,後嫁得郎君。否泰如天地,足以榮汝身。不嫁義郎體,其住欲何云 。蘭芝仰頭答,理實如兄言。謝家事夫婿,中道還兄門。處分適兄意,那得自任專。雖與府吏要,渠會永無緣。登即相許和,便可作婚姻。媒人下床去,諾諾復爾爾。還部白府君,下官奉使命。言談大有緣。府君得聞之,心中大歡喜。視曆復開書,便利此月內。六合正相應,良吉三十日。今已二十七。卿可去成婚。交語連裝束,絡繹如浮雲。青雀白鵠舫,四角龍子幡。婀娜隨風轉,金車玉作輪 。躑躅青驄馬,流蘇金縷鞍。齋錢三百萬,皆用青絲穿。雜綵三百匹,交廣市鮭珍。從人四五百,鬱鬱登郡門。阿母謂阿女:適得府君書,明日來迎汝。何不作衣裳,莫令事不舉。阿女默無聲,手巾掩口啼,淚落便如瀉。移我琉璃榻,出置前窗下。左手持刀尺,朝成繡裌裙,晚成單羅衫。晻晻日欲暝。愁思出門啼。府吏聞此變,因求假暫歸。右手執綾羅。未至二三里,摧藏馬悲哀。新婦識馬聲 ,躡履相逢迎。悵然遙相望,知是故人來。舉手拍馬鞍,嗟歎使心傷。自君別我後,人事不可量。果不如先願,又非君所詳。我有親父母,逼迫兼弟兄,以我應他人,君還何所望。府吏謂新婦:賀卿得高遷。磐石方且厚,可以卒千年。蒲葦一時紉,便作旦夕間。卿當日勝貴,吾獨向黃泉。新婦謂府吏:何意出此言。同是被逼迫,君爾妾亦然。黃泉下相見,勿違今日言。執手分道去,各各還家門 。生人作死別,恨恨那可論。念與世間辭,千萬不復全。府吏還家去,上堂拜阿母:今日大風寒,寒風摧樹木,嚴霜結庭蘭。兒今日冥冥,令母在後單。故作不良計,勿復怨鬼神。命如南山石,四體康且直。阿母得聞之,零淚應聲落。汝是大家子,仕宦於臺閣。慎勿為婦死,貴賤情何薄。東家有賢女,窈窕艷城郭。阿母為汝求,便復在旦夕。府吏再拜還,長歎空房中,作計乃爾立。轉頭向戶裡 ,漸見愁煎迫。其日牛馬嘶。新婦入青廬。菴菴黃昏後,寂寂人定初。我命絕今日,魂去尸長留。攬裙脫絲履,舉身赴清池。府吏聞此事,心知長別離。徘徊庭樹下,自掛東南枝。兩家求合葬,合葬華山傍。東西值松柏,左右種梧桐。枝枝相覆蓋,葉葉相交通。中有雙飛鳥,自名為鴛鴦。仰頭相向鳴,夜夜達五更。行人駐足聽,
寡婦起傍徨。多謝後世人,戒之慎勿忘。

 

惟因西北有高樓

古詩十九首‧《西北有高樓》之五

西北有高樓,上與浮雲齊。
交疏結綺窗,阿閣三重階。
上有弦歌聲,音響一何悲!
誰能爲此曲,無乃杞梁妻。
清商隨風發,中曲正徘徊。
一彈再三歎,慷慨有餘哀。
不惜歌者苦,但傷知音稀。
願爲雙鴻鵠,奮翅起高飛。

先天西北艮為山,伏羲東南兌是澤。天造地設山澤戀,何事惹得是非怨?誰知運命出後天,西北乾遇巽東南,天風姤起謗言生,文王八卦機緣轉!

Michael Nielsen 先生選擇用滾瓜爛熟之例宣說『廣義策略』︰

Broad strategy: When using neural networks to attack a new problem the first challenge is to get any non-trivial learning, i.e., for the network to achieve results better than chance. This can be surprisingly difficult, especially when confronting a new class of problem. Let’s look at some strategies you can use if you’re having this kind of trouble.

或因已學『先天易』乎??深曉未知『登高難』耶!!

Suppose, for example, that you’re attacking MNIST for the first time. You start out enthusiastic, but are a little discouraged when your first network fails completely, as in the example above. The way to go is to strip the problem down. Get rid of all the training and validation images except images which are 0s or 1s. Then try to train a network to distinguish 0s from 1s. Not only is that an inherently easier problem than distinguishing all ten digits, it also reduces the amount of training data by 80 percent, speeding up training by a factor of 5. That enables much more rapid experimentation, and so gives you more rapid insight into how to build a good network.

You can further speed up experimentation by stripping your network down to the simplest network likely to do meaningful learning. If you believe a [784, 10] network can likely do better-than-chance classification of MNIST digits, then begin your experimentation with such a network. It’ll be much faster than training a [784, 30, 10] network, and you can build back up to the latter.

You can get another speed up in experimentation by increasing the frequency of monitoring. In network2.py we monitor performance at the end of each training epoch. With 50,000 images per epoch, that means waiting a little while – about ten seconds per epoch, on my laptop, when training a [784, 30, 10] network – before getting feedback on how well the network is learning. Of course, ten seconds isn’t very long, but if you want to trial dozens of hyper-parameter choices it’s annoying, and if you want to trial hundreds or thousands of choices it starts to get debilitating. We can get feedback more quickly by monitoring the validation accuracy more often, say, after every 1,000 training images. Furthermore, instead of using the full 10,000 image validation set to monitor performance, we can get a much faster estimate using just 100 validation images. All that matters is that the network sees enough images to do real learning, and to get a pretty good rough estimate of performance. Of course, our program network2.py doesn’t currently do this kind of monitoring. But as a kludge to achieve a similar effect for the purposes of illustration, we’ll strip down our training data to just the first 1,000 MNIST training images. Let’s try it and see what happens. (To keep the code below simple I haven’t implemented the idea of using only 0 and 1 images. Of course,

>>> net = network2.Network([784, 10])
>>> net.SGD(training_data[:1000], 30, 10, 10.0, lmbda = 1000.0, \
... evaluation_data=validation_data[:100], \
... monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 10 / 100

Epoch 1 training complete
Accuracy on evaluation data: 10 / 100

Epoch 2 training complete
Accuracy on evaluation data: 10 / 100
...

We’re still getting pure noise! But there’s a big win: we’re now getting feedback in a fraction of a second, rather than once every ten seconds or so. That means you can more quickly experiment with other choices of hyper-parameter, or even conduct experiments trialling many different choices of hyper-parameter nearly simultaneously.

In the above example I left \lambda as \lambda =1000.0, as we used earlier. But since we changed the number of training examples we should really change \lambda to keep the weight decay the same. That means changing \lambda to 20.0. If we do that then this is what happens:

>>> net = network2.Network([784, 10])
>>> net.SGD(training_data[:1000], 30, 10, 10.0, lmbda = 20.0, \
... evaluation_data=validation_data[:100], \
... monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 12 / 100

Epoch 1 training complete
Accuracy on evaluation data: 14 / 100

Epoch 2 training complete
Accuracy on evaluation data: 25 / 100

Epoch 3 training complete
Accuracy on evaluation data: 18 / 100
...

 

Ahah! We have a signal. Not a terribly good signal, but a signal nonetheless. That’s something we can build on, modifying the hyper-parameters to try to get further improvement. Maybe we guess that our learning rate needs to be higher. (As you perhaps realize, that’s a silly guess, for reasons we’ll discuss shortly, but please bear with me.) So to test our guess we try dialing \eta up to 100.0:

>>> net = network2.Network([784, 10])
>>> net.SGD(training_data[:1000], 30, 10, 100.0, lmbda = 20.0, \
... evaluation_data=validation_data[:100], \
... monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 10 / 100

Epoch 1 training complete
Accuracy on evaluation data: 10 / 100

Epoch 2 training complete
Accuracy on evaluation data: 10 / 100

Epoch 3 training complete
Accuracy on evaluation data: 10 / 100

...

 

That’s no good! It suggests that our guess was wrong, and the problem wasn’t that the learning rate was too low. So instead we try dialing \eta down to \eta = 1.0:

>>> net = network2.Network([784, 10])
>>> net.SGD(training_data[:1000], 30, 10, 1.0, lmbda = 20.0, \
... evaluation_data=validation_data[:100], \
... monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 62 / 100

Epoch 1 training complete
Accuracy on evaluation data: 42 / 100

Epoch 2 training complete
Accuracy on evaluation data: 43 / 100

Epoch 3 training complete
Accuracy on evaluation data: 61 / 100

...

 

That’s better! And so we can continue, individually adjusting each hyper-parameter, gradually improving performance. Once we’ve explored to find an improved value for η, then we move on to find a good value for \lambda. Then experiment with a more complex architecture, say a network with 10 hidden neurons. Then adjust the values for \eta and \lambda again. Then increase to 20 hidden neurons. And then adjust other hyper-parameters some more. And so on, at each stage evaluating performance using our held-out validation data, and using those evaluations to find better and better hyper-parameters. As we do so, it typically takes longer to witness the impact due to modifications of the hyper-parameters, and so we can gradually decrease the frequency of monitoring.

This all looks very promising as a broad strategy. However, I want to return to that initial stage of finding hyper-parameters that enable a network to learn anything at all. In fact, even the above discussion conveys too positive an outlook. It can be immensely frustrating to work with a network that’s learning nothing. You can tweak hyper-parameters for days, and still get no meaningful response. And so I’d like to re-emphasize that during the early stages you should make sure you can get quick feedback from experiments. Intuitively, it may seem as though simplifying the problem and the architecture will merely slow you down. In fact, it speeds things up, since you much more quickly find a network with a meaningful signal. Once you’ve got such a signal, you can often get rapid improvements by tweaking the hyper-parameters. As with many things in life, getting started can be the hardest thing to do.

Okay, that’s the broad strategy. Let’s now look at some specific recommendations for setting hyper-parameters. I will focus on the learning rate, \eta, the L2 regularization parameter, \lambda, and the mini-batch size. However, many of the remarks apply also to other hyper-parameters, including those associated to network architecture, other forms of regularization, and some hyper-parameters we’ll meet later in the book, such as the momentum co-efficient.

───

 

設若以西遊記

第六十八回 朱紫國唐僧論前世 孫行者施為三折肱

朕西牛賀洲朱紫國王,自立業以來,四方平服,百姓清安。近因國事不祥,沉痾伏枕,淹延日久難痊。本國太醫院屢選良方,未能調治。今出此榜文,普招天下賢士。不拘北往東來,中華外國,若有精醫藥者,請登寶殿,療理朕躬。稍得病愈,願將社稷平分,決不虛示。為此出給張掛。須至榜者。

覽畢,滿心歡喜道:「古人云:『行動有三分財氣。』早是不在館中獃坐。即此不必買甚調和,且把取經事寧耐一日,等老孫做個醫生耍耍。」

為範,欲用『神經網絡』行孫悟空把脈之事 ︰

切診

切診,包括脈診按診兩部分,是醫生運用雙手對病人的一定部位進行觸、摸、按壓,從而了解疾病情況的方法。脈診是按脈搏按診是對病人的肌膚、手及其病變部位的觸摸按壓,以測知局部冷熱、軟硬、壓痛、包塊或其他異常的變化,從而推斷疾病的部位和性質的一種診察方法。

常見病脈

一般來說,一個健康人的脈象應為呼吸之間跳動四次,寸關尺三部之脈和緩有力,不浮不沉。常見的病脈,主要有等。

  • 浮脈:浮的意思是脈位浮於表面,輕按可得,重按則減。表證由於外感病邪停留於表,因此脈氣鼓動於外,脈位淺顯。浮而有力則表實;浮而無力則表虛。虛陽外浮,脈浮大無力為危證。
  • 沉脈:跟浮脈相反,沉脈脈位輕按而不得。主要為裡證。跟浮脈相似,有力為裡實,無力則為裡虛。
  • 數脈:數脈跟遲脈相反,意即脈象跳動頻密,每分鐘跳動九十次以上。主為熱證,有力為實熱,無力為虛熱
  • 遲脈:遲的意思,是脈頻跳動遲緩,平均每分鐘跳動六十次以下。主病為寒證,有力為實寒,無力為虛寒
  • 虛脈:寸關尺三部脈皆無力,重按則空虛。主為虛證。
  • 實脈:寸關尺三部脈象皆有力,主為實證。
  • 滑脈:脈象滑如走珠,按之流利為滑脈,乃健康氣血充實之表徵。滑數之象,則為喜脈
  • 洪脈:洪脈,就是如洪水一般的意思。脈大而有力,波濤洶湧,來盛去衰。主為熱盛。
  • 細脈:脈細小如線,起落明顯為細脈,主為虛證。
  • 弦脈:脈按之如按琴弦。主為病、痛證、飲。

───

隨縁居

郭琛 2011-06-12, 德國

二十八脈

浮脈類:浮、洪、濡、散、芤、革。浮脈類的波形如下: 脈象舉取, 尋取, 按取 的結構. 此類特徵是脈博在舉取遠較尋取、按取.

1

 

將如之何哉???