【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-5右 》

由於 judy 使用的麥克風不同,且是派生二 Python 之環境,又不知 pocketsphinx 新舊版差異多大︰

pi@raspberrypi:~ apt-cache show pocketsphinx Package: pocketsphinx Version: 0.8+5prealpha-2 Architecture: armhf Maintainer: Debian Accessibility Team <debian-accessibility@lists.debian.org> Installed-Size: 8667 Depends: libc6 (>= 2.4), libpocketsphinx3, libsphinxbase3, libjs-jquery Multi-Arch: foreign Homepage: http://cmusphinx.sourceforge.net/ Priority: extra Section: sound Filename: pool/main/p/pocketsphinx/pocketsphinx_0.8+5prealpha-2_armhf.deb Size: 508836 SHA256: feeca4c0385a2fd5274ce21f9f3153bd217bbab1d1001174cea6bab8f54080fd SHA1: 0fd4274deae8e8d3cc03bd1e7045cc6ad868f91b MD5sum: 72894950089cb773826e6e90b9c94850 Description: Speech recognition tool  CMU Sphinx is a large vocabulary, speaker-independent continuous speech  recognition engine.  .  This package contains end-user speech recognition tools. Description-md5: 17fb8ee80efcb04aa65b542b1b312aa2 </pre>    <span style="color: #666699;">故此不講該程式庫用法及測試也。僅借其範例控制文本︰</span>  <span style="color: #808080;">How are you today</span> <span style="color: #808080;">Good morning</span> <span style="color: #808080;">night</span> <span style="color: #808080;">afternoon</span>  <span style="color: #666699;">來趟工具之旅吧!</span>  <span style="color: #666699;">上傳文本︰</span> <h1><span style="font-size: 14pt;"><img class="alignnone wp-image-81258" src="http://www.freesandal.org/wp-content/uploads/Sphinx-plate.png" alt="" width="81" height="80" /> <span style="color: #ff9900;"><a style="color: #ff9900;" href="http://www.speech.cs.cmu.edu/tools/lmtool-new.html">Sphinx Knowledge Base Tool -- VERSION 3</a></span></span></h1>   <table border="1" cellpadding="10"> <tbody> <tr> <td> <h3><span style="color: #808080;">This is the new version of the <a style="color: #808080;" href="http://www.speech.cs.cmu.edu/tools/lmtool.html">lmtool</a>!    <a style="color: #808080;" href="http://www.speech.cs.cmu.edu/tools/FAQ.html">FAQ</a></span></h3> <h4><span style="color: #808080;">Changes should be transparent (unless you automate, see note below).</span> <span style="color: #808080;"> Problems? Please help by sending a report to the maintainer.</span></h4> </td> </tr> <tr> <td><span style="color: #808080;"><b>New!</b> Follow us on <tt>@CMUSpeechGroup</tt> for announcements and status updates.</span></td> </tr> </tbody> </table> <span style="color: #ff9900;"><b> What it does:</b> Builds a consistent set of lexical and language modeling files for Sphinx (and compatible) decoders.</span> <span style="color: #ff9900;"> <b>Note:</b> If you just need pronunciations, use the <a style="color: #ff9900;" href="http://www.speech.cs.cmu.edu/tools/lextool.html">lextool</a> instead.</span>  <span style="color: #ff0000;"><b>To use:</b> Create a sentence corpus file, consisting of all sentences you would like the decoder to recognize. The sentences should be one to a line (but do not need to have standard punctuation). You may not need to exhastively list all possible sentences: the decoder will allow fragments to recombine into new sentences.</span>  <form action="http://www.speech.cs.cmu.edu/cgi-bin/tools/lmtool/run" enctype="multipart/form-data" method="POST"><span style="color: #808080;"><b>Upload a sentence corpus file</b>:</span> <input name="corpus" size="60" type="FILE" value="empty" /></form>   <span style="color: #339966;">The <strong>new version of lmtool</strong> has been reorganized internally to make use of the <a style="color: #339966;" href="http://svn.code.sf.net/p/cmusphinx/code/trunk/logios/">Logios</a><a style="color: #339966;"> package. This will make lmtool easier to maintain in the future and will allow it to take advantage of ongoing development in Logios. These changes should be transparent to regular users. Please give it a try. If you have any problems, or discover bugs, let the maintainer know. If things look good (i.e., I stop getting bug reports) this will become the standard version. </a></span>  <span style="color: #808080;"><strong>NOTE:</strong> If you have automated the use of this tool you will need to update your code. The main difference is that the name of the target script has changed. The old script will still be available so nothing will break immediately, but it's unlikely to continue to be maintained. Also, file links are no longer tagged in the html. Please let me know if you make use of this feature and I'll find a fix.</span>  <span style="color: #808080;">……</span> <h1><span style="color: #ff9900;"><a style="color: #ff9900;" href="http://www.speech.cs.cmu.edu/tools/product/1517214435_23951/">Sphinx knowledge base generator [lmtool.3a]</a></span></h1>  <hr />  <span style="color: #808080;">Your Sphinx knowledge base compilation has been successfully processed!</span>  <span style="color: #808080;">The base name for this set is <b>2029</b>. <a style="color: #808080;" href="http://www.speech.cs.cmu.edu/tools/product/1517214435_23951/TAR2029.tgz">TAR2029.tgz</a> is the compressed version.</span> <span style="color: #808080;"> Note that this set of files is internally consistent and is best used together.</span>  <span style="color: #808080;"><strong>IMPORTANT:</strong> Please download these files as soon as possible; they will be deleted in approximately a half hour.</span> <table border="1" cellspacing="0" cellpadding="10"> <tbody> <tr align="TOP"> <td> <pre>SESSION 1517214435_23951 [_INFO_] Found corpus: 4 sentences, 8 unique words [_INFO_] Found 0 words in extras  (0) [_INFO_] Language model completed  (0) [_INFO_] Pronounce completed  (0) [_STAT_] Elapsed time: 0.007 sec</pre> Please include these messages in bug reports.</td> </tr> </tbody> </table> <span style="color: #808080;">………</span>     <span style="color: #666699;">取回產生檔︰</span>  <span style="color: #808080;">wget http://www.speech.cs.cmu.edu/tools/product/1517214435_23951/TAR2029.tgz</span> <pre class="lang:default decode:true">pi@raspberrypi:~/test tar -zxvf TAR2029.tgz 
2029.dic
2029.lm
2029.log_pronounce
2029.sent
2029.vocab

 

實際驗證︰

pi@raspberrypi:~ pocketsphinx_continuous -adcdev plughw:1,0 -lm ./test/2029.lm -dict ./test/2029.dic -inmic yes INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/en-us/en-us/feat.params Current configuration: [NAME]			[DEFLT]		[VALUE] -agc			none		none -agcthresh		2.0		2.000000e+00 -allphone				 -allphone_ci		no		no -alpha			0.97		9.700000e-01 -ascale			20.0		2.000000e+01 -aw			1		1 -backtrace		no		no -beam			1e-48		1.000000e-48 -bestpath		yes		yes -bestpathlw		9.5		9.500000e+00 -ceplen			13		13 -cmn			current		current -cmninit		8.0		40,3,-1 -compallsen		no		no -debug					0 -dict					./test/2029.dic -dictcase		no		no -dither			no		no -doublebw		no		no -ds			1		1 -fdict					/usr/share/pocketsphinx/model/en-us/en-us/noisedict -feat			1s_c_d_dd	1s_c_d_dd -featparams				/usr/share/pocketsphinx/model/en-us/en-us/feat.params -fillprob		1e-8		1.000000e-08 -frate			100		100 -fsg					 -fsgusealtpron		yes		yes -fsgusefiller		yes		yes -fwdflat		yes		yes -fwdflatbeam		1e-64		1.000000e-64 -fwdflatefwid		4		4 -fwdflatlw		8.5		8.500000e+00 -fwdflatsfwin		25		25 -fwdflatwbeam		7e-29		7.000000e-29 -fwdtree		yes		yes -hmm					/usr/share/pocketsphinx/model/en-us/en-us -input_endian		little		little -jsgf					 -keyphrase				 -kws					 -kws_delay		10		10 -kws_plp		1e-1		1.000000e-01 -kws_threshold		1		1.000000e+00 -latsize		5000		5000 -lda					 -ldadim			0		0 -lifter			0		22 -lm					./test/2029.lm -lmctl					 -lmname					 -logbase		1.0001		1.000100e+00 -logfn					 -logspec		no		no -lowerf			133.33334	1.300000e+02 -lpbeam			1e-40		1.000000e-40 -lponlybeam		7e-29		7.000000e-29 -lw			6.5		6.500000e+00 -maxhmmpf		30000		30000 -maxwpf			-1		-1 -mdef					/usr/share/pocketsphinx/model/en-us/en-us/mdef -mean					/usr/share/pocketsphinx/model/en-us/en-us/means -mfclogdir				 -min_endfr		0		0 -mixw					 -mixwfloor		0.0000001	1.000000e-07 -mllr					 -mmap			yes		yes -ncep			13		13 -nfft			512		512 -nfilt			40		25 -nwpen			1.0		1.000000e+00 -pbeam			1e-48		1.000000e-48 -pip			1.0		1.000000e+00 -pl_beam		1e-10		1.000000e-10 -pl_pbeam		1e-10		1.000000e-10 -pl_pip			1.0		1.000000e+00 -pl_weight		3.0		3.000000e+00 -pl_window		5		5 -rawlogdir				 -remove_dc		no		no -remove_noise		yes		yes -remove_silence		yes		yes -round_filters		yes		yes -samprate		16000		1.600000e+04 -seed			-1		-1 -sendump				/usr/share/pocketsphinx/model/en-us/en-us/sendump -senlogdir				 -senmgau				 -silprob		0.005		5.000000e-03 -smoothspec		no		no -svspec					0-12/13-25/26-38 -tmat					/usr/share/pocketsphinx/model/en-us/en-us/transition_matrices -tmatfloor		0.0001		1.000000e-04 -topn			4		4 -topn_beam		0		0 -toprule				 -transform		legacy		dct -unit_area		yes		yes -upperf			6855.4976	6.800000e+03 -uw			1.0		1.000000e+00 -vad_postspeech		50		50 -vad_prespeech		20		20 -vad_startspeech	10		10 -vad_threshold		2.0		2.000000e+00 -var					/usr/share/pocketsphinx/model/en-us/en-us/variances -varfloor		0.0001		1.000000e-04 -varnorm		no		no -verbose		no		no -warp_params				 -warp_type		inverse_linear	inverse_linear -wbeam			7e-29		7.000000e-29 -wip			0.65		6.500000e-01 -wlen			0.025625	2.562500e-02  INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none' INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0 INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38 INFO: mdef.c(518): Reading model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq INFO: tmat.c(206): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/en-us/en-us/transition_matrices INFO: acmod.c(117): Attempting to use PTM computation module INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/means INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:  INFO: ms_gauden.c(294):  128x13 INFO: ms_gauden.c(294):  128x13 INFO: ms_gauden.c(294):  128x13 INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/variances INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:  INFO: ms_gauden.c(294):  128x13 INFO: ms_gauden.c(294):  128x13 INFO: ms_gauden.c(294):  128x13 INFO: ms_gauden.c(354): 222 variance values floored INFO: ptm_mgau.c(476): Loading senones from dump file /usr/share/pocketsphinx/model/en-us/en-us/sendump INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126 INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones INFO: ptm_mgau.c(835): Maximum top-N: 4 INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0 INFO: dict.c(320): Allocating 4112 * 20 bytes (80 KiB) for word entries INFO: dict.c(333): Reading main dictionary: ./test/2029.dic INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones INFO: dict.c(336): 11 words read INFO: dict.c(358): Reading filler dictionary: /usr/share/pocketsphinx/model/en-us/en-us/noisedict INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones INFO: dict.c(361): 5 words read INFO: dict2pid.c(396): Building PID tables for dictionary INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format INFO: ngram_model_trie.c(467): Header doesn't match INFO: ngram_model_trie.c(189): Trying to read LM in arpa format INFO: ngram_model_trie.c(205): LM of order 3 INFO: ngram_model_trie.c(207): #1-grams: 10 INFO: ngram_model_trie.c(207): #2-grams: 12 INFO: ngram_model_trie.c(207): #3-grams: 8 INFO: lm_trie.c(399): Training quantizer INFO: lm_trie.c(407): Building LM trie INFO: ngram_search_fwdtree.c(99): 10 unique initial diphones INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 7 single-phone words INFO: ngram_search_fwdtree.c(186): Creating search tree INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 7 single-phone words INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 144 INFO: ngram_search_fwdtree.c(339): after: 10 root, 16 non-root channels, 6 single-phone words INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25 INFO: continuous.c(305): pocketsphinx_continuous COMPILED ON: May 22 2016, AT: 22:01:16  READY.... Listening... INFO: cmn_prior.c(131): cmn_prior_update: from < 49.97  1.81 -0.92 -14.74 13.21 -4.28  4.37 -9.02  0.91 -0.52  1.94  6.10  5.16 > INFO: cmn_prior.c(149): cmn_prior_update: to   < 51.51  4.53 -0.17 -16.11 16.84 -6.16  3.78 -7.40  2.23 -2.91  2.76  3.19  3.94 > INFO: ngram_search_fwdtree.c(1553):      625 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1555):    14861 senones evaluated (177/fr) INFO: ngram_search_fwdtree.c(1559):     8554 channels searched (101/fr), 800 1st, 6824 last INFO: ngram_search_fwdtree.c(1562):      740 words for which last channels evaluated (8/fr) INFO: ngram_search_fwdtree.c(1564):      357 candidate words for entering last phone (4/fr) INFO: ngram_search_fwdtree.c(1567): fwdtree 0.27 CPU 0.321 xRT INFO: ngram_search_fwdtree.c(1570): fwdtree 1.07 wall 1.275 xRT INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 9 words INFO: ngram_search_fwdflat.c(948):      554 words recognized (7/fr) INFO: ngram_search_fwdflat.c(950):    16178 senones evaluated (193/fr) INFO: ngram_search_fwdflat.c(952):    11548 channels searched (137/fr) INFO: ngram_search_fwdflat.c(954):      946 words searched (11/fr) INFO: ngram_search_fwdflat.c(957):      477 word transitions (5/fr) INFO: ngram_search_fwdflat.c(960): fwdflat 0.16 CPU 0.190 xRT INFO: ngram_search_fwdflat.c(963): fwdflat 0.16 wall 0.194 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.40 INFO: ngram_search.c(1279): Eliminated 0 nodes before end node INFO: ngram_search.c(1384): Lattice has 121 nodes, 127 links INFO: ps_lattice.c(1380): Bestpath score: -1785 INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:40:82) = -103465 INFO: ps_lattice.c(1441): Joint P(O,S) = -128686 P(S|O) = -25221 INFO: ngram_search.c(875): bestpath 0.01 CPU 0.012 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.001 xRT ### NIGHT  READY.... Listening...  ^C pi@raspberrypi:~