【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-7.0 》

且讓我們回到之前擱置的主題︰

Speech recognition module for Python, supporting several engines and APIs, online and offline. https://pypi.python.org/pypi/SpeechRe…

 

在仔細閱讀文件, sudo pip3 install SpeechRecognition 安裝後,請先確認 SpeechRecognition 程式庫找得到你的麥克風︰

Troubleshooting

The recognizer hangs on recognizer_instance.listen; specifically, when it’s calling Microphone.MicrophoneStream.read.

This usually happens when you’re using a Raspberry Pi board, which doesn’t have audio input capabilities by itself. This causes the default microphone used by PyAudio to simply block when we try to read it. If you happen to be using a Raspberry Pi, you’ll need a USB sound card (or USB microphone).

Once you do this, change all instances of Microphone() to Microphone(device_index=MICROPHONE_INDEX), where MICROPHONE_INDEX is the hardware-specific index of the microphone.

To figure out what the value of MICROPHONE_INDEX should be, run the following code:

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import speech_recognition as sr

In [2]: for index, name in enumerate(sr.Microphone.list_microphone_names()):
   ...:     print("Microphone with name \"{1}\" found for `Microphone(device_ind
   ...: ex={0})`".format(index, name))
   ...:

 

This will print out something like the following:

...
Microphone with name "bcm2835 ALSA: - (hw:0,0)" found for `Microphone(device_index=0)`
Microphone with name "bcm2835 ALSA: IEC958/HDMI (hw:0,1)" found for `Microphone(device_index=1)`
Microphone with name "seeed-4mic-voicecard: - (hw:1,0)" found for `Microphone(device_index=2)`
Microphone with name "sysdefault" found for `Microphone(device_index=3)`
Microphone with name "playback" found for `Microphone(device_index=4)`
Microphone with name "dmixed" found for `Microphone(device_index=5)`
Microphone with name "ac108" found for `Microphone(device_index=6)`
Microphone with name "dmix" found for `Microphone(device_index=7)`
Microphone with name "default" found for `Microphone(device_index=8)`
Microphone with name "/dev/dsp" found for `Microphone(device_index=9)`

In [3]: 

 

Now, to use the seeed-4mic-voicecard microphone, you would change Microphone() to Microphone(device_index=2).

 

若有問題須查核 PyAudio 之版本︰

Speech Recognition Library Reference

Microphone(device_index: Union[int, None] = None, sample_rate: int = 16000, chunk_size: int = 1024) -> Microphone

Creates a new Microphone instance, which represents a physical microphone on the computer. Subclass of AudioSource.

This will throw an AttributeError if you don’t have PyAudio 0.2.11 or later installed.

If device_index is unspecified or None, the default microphone is used as the audio source. Otherwise, device_index should be the index of the device to use for audio input.

A device index is an integer between 0 and pyaudio.get_device_count() - 1 (assume we have used import pyaudio beforehand) inclusive. It represents an audio device such as a microphone or speaker. See the PyAudio documentation for more details.

The microphone audio is recorded in chunks of chunk_size samples, at a rate of sample_rate samples per second (Hertz).

Higher sample_rate values result in better audio quality, but also more bandwidth (and therefore, slower recognition). Additionally, some machines, such as some Raspberry Pi models, can’t keep up if this value is too high.

Higher chunk_size values help avoid triggering on rapidly changing ambient noise, but also makes detection less sensitive. This value, generally, should be left at its default.

Instances of this class are context managers, and are designed to be used with with statements:

In [1]: import speech_recognition as sr

In [2]: m = sr.Microphone()
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front
...

In [3]: m.get_pyaudio()
Out[3]: <module 'pyaudio' from '/usr/lib/python3/dist-packages/pyaudio.py'>

In [4]: m.list_microphone_names()
Out[4]: 
['bcm2835 ALSA: - (hw:0,0)',
 'bcm2835 ALSA: IEC958/HDMI (hw:0,1)',
 'seeed-4mic-voicecard: - (hw:1,0)',
 'sysdefault',
 'playback',
 'dmixed',
 'ac108',
 'dmix',
 'default',
 '/dev/dsp']

In [5]: with m as source:
   ...:     pass 
   ...: 

In [6]: 

 

承上篇, ReSpeaker 4Mic 可用 2ch 48k 系統預設也︰

Help on Microphone in module speech_recognition object:

class Microphone(AudioSource)
| Creates a new “Microphone“ instance, which represents a physical microphone on the computer. Subclass of “AudioSource“.
|
| This will throw an “AttributeError“ if you don’t have PyAudio 0.2.11 or later installed.
|
| If “device_index“ is unspecified or “None“, the default microphone is used as the audio source. Otherwise, “device_index“ should be the index of the device to use for audio input.
|
| A device index is an integer between 0 and “pyaudio.get_device_count() – 1“ (assume we have used “import pyaudio“ beforehand) inclusive. It represents an audio device such as a microphone or speaker. See the `PyAudio documentation <http://people.csail.mit.edu/hubert/pyaudio/docs/>`__ for more details.
|
| The microphone audio is recorded in chunks of “chunk_size“ samples, at a rate of “sample_rate“ samples per second (Hertz). If not specified, the value of “sample_rate“ is determined automatically from the system’s microphone settings.
|
| Higher “sample_rate“ values result in better audio quality, but also more bandwidth (and therefore, slower recognition). Additionally, some CPUs, such as those in older Raspberry Pi models, can’t keep up if this value is too high.
|
| Higher “chunk_size“ values help avoid triggering on rapidly changing ambient noise, but also makes detection less sensitive. This value, generally, should be left at its default.

 

驗證測試 OK ◎

pi@raspberrypi:~ $ ipython3 
Python 3.5.3 (default, Jan 19 2017, 14:11:04) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import speech_recognition as sr

In [2]: r = sr.Recognizer()

In [3]: with sr.Microphone() as source:
   ...:     r.adjust_for_ambient_noise(source)
   ...:     print("Say something!")
   ...:     audio = r.listen(source)
   ...:     
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front
...

Say something!

In [4]: print("Sphinx thinks you said " + r.recognize_sphinx(audio))
Sphinx thinks you said oh oh

In [5]: 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.5 》

話說從頭,年前友人偶訪談及 ReSpeaker Raspbian Jessie 相容性問題 ?意外引發

PyAudio

PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms, such as GNU/Linux, Microsoft Windows, and Apple Mac OS X / macOS.

PyAudio is inspired by:

錯誤訊息

pi@raspberrypi:~ python3 Python 3.5.3 (default, Jan 19 2017, 14:11:04)  [GCC 6.3.0 20170124] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pyaudio >>> p = pyaudio.PyAudio() ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958 ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'defaults.bluealsa.device' ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:4996:(snd_config_expand) Args evaluate error: No such file or directory ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM bluealsa ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'defaults.bluealsa.device' ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:4996:(snd_config_expand) Args evaluate error: No such file or directory ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM bluealsa ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream ALSA lib pcm_ac108.c:469:(_snd_pcm_ac108_open) a108 is only for capture Expression 'ioctl( devHandle, SNDCTL_DSP_CHANNELS, &numChannels )' failed in 'src/hostapi/oss/pa_unix_oss.c', line: 414 Cannot connect to server socket err = No such file or directory Cannot connect to server request channel jack server is not running or cannot be started JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock >>></pre>    <span style="color: #808000;">誰該負責之論!</span>  <span style="color: #808000;">由於那是多種『語音辨識』平台使用的程式庫,網路上早已嘰嘰喳喳滿是 Q&A 矣!?</span>  <span style="color: #808080;">─── 《<a style="color: #808080;" href="http://www.freesandal.org/?p=80711">Raspbian Stretch 《六之 K 》</a>》</span>     <span style="color: #666699;">既然已花了<a style="color: #666699;" href="http://www.freesandal.org/?p=80763">相當的篇幅</a>介紹派生之 <a style="color: #666699;" href="http://www.freesandal.org/?p=80740">Sound Device</a> ,想必讀者自能應用吧☆</span>  <span style="color: #666699;">此處回返略補充 pyaudio 文件 </span> <h1><span style="color: #ff9900;"><a style="color: #ff9900;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#pyaudio.PyAudio.get_default_input_device_info">PyAudio Documentation</a></span></h1> <div id="contents" class="contents topic"> <span style="color: #808080;">Contents</span>  <ul class="simple">  	<li><span style="color: #808080;"><a id="id1" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#pyaudio-documentation">PyAudio Documentation</a></span> <ul>  	<li><span style="color: #808080;"><a id="id2" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#module-pyaudio">Introduction</a></span> <ul>  	<li><span style="color: #808080;"><a id="id3" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#example-blocking-mode-audio-i-o">Example: Blocking Mode Audio I/O</a></span></li>  	<li><span style="color: #808080;"><a id="id4" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#example-callback-mode-audio-i-o">Example: Callback Mode Audio I/O</a></span></li>  	<li><span style="color: #808080;"><a id="id5" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#overview">Overview</a></span></li>  	<li><span style="color: #808080;"><a id="id6" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#details">Details</a></span></li> </ul> </li>  	<li><span style="color: #808080;"><a id="id7" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#class-pyaudio">Class PyAudio</a></span></li>  	<li><span style="color: #808080;"><a id="id8" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#class-stream">Class Stream</a></span></li>  	<li><span style="color: #808080;"><a id="id9" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#platform-specific">Platform Specific</a></span></li> </ul> </li>  	<li><span style="color: #808080;"><a id="id10" class="reference internal" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#indices-and-tables">Indices and tables</a></span></li> </ul> </div> <div id="module-pyaudio" class="section"> <h2><span style="color: #808080;"><a class="toc-backref" style="color: #808080;" href="http://people.csail.mit.edu/hubert/pyaudio/docs/#id2">Introduction</a></span></h2> <span style="color: #808080;">PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms. PyAudio is inspired by:</span> <ul class="simple">  	<li><span style="color: #808080;">pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.</span></li>  	<li><span style="color: #808080;">tkSnack: cross-platform sound toolkit for Tcl/Tk and Python.</span></li> </ul> </div>    <span style="color: #666699;">及驗證乙事。</span> <pre class="lang:default decode:true">pi@raspberrypi:~ ipython3 
Python 3.5.3 (default, Jan 19 2017, 14:11:04) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pyaudio

In [2]: pyaudio.get_portaudio_version_text()
Out[2]: 'PortAudio V19.6.0-devel, revision 396fe4b6699ae929d3a685b3ef8a7e97396139a4'

In [3]: pyaudio.get_portaudio_version()
Out[3]: 1246720

In [4]: p = pyaudio.PyAudio()
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front
...

In [5]: p.get_device_count()
Out[5]: 10

In [6]: p. get_default_input_device_info()
Out[6]: 
{'defaultHighInputLatency': 0.034829931972789115,
 'defaultHighOutputLatency': -1.0,
 'defaultLowInputLatency': 0.005804988662131519,
 'defaultLowOutputLatency': -1.0,
 'defaultSampleRate': 44100.0,
 'hostApi': 0,
 'index': 2,
 'maxInputChannels': 2,
 'maxOutputChannels': 0,
 'name': 'seeed-4mic-voicecard: - (hw:1,0)',
 'structVersion': 2}

In [7]: p.get_default_output_device_info()
Out[7]: 
{'defaultHighInputLatency': -1.0,
 'defaultHighOutputLatency': 0.034829931972789115,
 'defaultLowInputLatency': -1.0,
 'defaultLowOutputLatency': 0.005804988662131519,
 'defaultSampleRate': 44100.0,
 'hostApi': 0,
 'index': 8,
 'maxInputChannels': 0,
 'maxOutputChannels': 2,
 'name': 'default',
 'structVersion': 2}

In [8]: p.get_device_info_by_index(2)
Out[8]: 
{'defaultHighInputLatency': 0.034829931972789115,
 'defaultHighOutputLatency': -1.0,
 'defaultLowInputLatency': 0.005804988662131519,
 'defaultLowOutputLatency': -1.0,
 'defaultSampleRate': 44100.0,
 'hostApi': 0,
 'index': 2,
 'maxInputChannels': 2,
 'maxOutputChannels': 0,
 'name': 'seeed-4mic-voicecard: - (hw:1,0)',
 'structVersion': 2}

In [9]:

 

測試還請參考︰

/sound_recorder.py

Created Jan 28, 2014

Simple script to record sound from the microphone, dependencies: easy_install pyaudio

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.4 》

一個百年前的悖論

伯特蘭悖論 (機率論)

伯特蘭悖論是一個有關機率論的傳統解釋會導致的悖論。約瑟·伯特蘭於1888年在他的著作《Calcul des probabilités》中提到此悖論,用來舉例說明,若產生隨機變數的「機制」或「方法」沒有清楚定義好的話,機率也將無法得到良好的定義。

伯特蘭悖論的內容

伯特蘭悖論的內容如下:考慮一個內接於的等邊三角形。若隨機選方圓上的個弦,則此弦的長度比三角形的邊較長的機率為何?

伯特蘭給出了三個論證,全都是明顯有效的,但導致的結果都不相同。

  1. 隨機的弦,方法1;紅=比三角形的邊較長,藍=比三角形的邊較短

    「隨 機端點」方法:在圓周上隨機選給兩點,並畫出連接兩點的弦。為了計算問題中的機率,可以想像三角形會旋轉,使得其頂點會碰到弦端點中的一點。可觀察 到,若另一個弦端點在弦會穿過三角形的一邊的弧上,則弦的長度會比三角形的邊較長。而弧的長度是圓周的三分之一,因此隨機的弦會比三角形的邊較長的機率亦 為三分之一。

  2. 隨機的弦,方法2

    「隨 機半徑」方法:選擇一個圓的半徑和半徑上的一點,再畫出通過此點並垂直半徑的弦。為了計算問題的機率,可以想像三角形會旋轉,使得其一邊會垂直於半 徑。可觀察到,若選擇的點比三角形和半徑相交的點要接近圓的中心,則弦的長度會比三角形的邊較長。三角形的邊會平分半徑,因此隨機的弦會比三角形的邊較長 的機率亦為二分之一。

  3. 隨機的弦,方法3

    「隨機中點」方法:選擇圓內的任意一點,並畫出以此點為中點的弦。可觀察到,若選擇的點落在半徑只有大圓的半徑的二分之一的同心圓之內,則弦的長度會比三角形的邊較長。小圓的面積是大圓的四分之一,因此隨機的弦會比三角形的邊較長的機率亦為四分之一。

上述方法可以如下圖示。每一個弦都可以被其中點唯一決定。上述三種方法會給出不同中點的分布。方法1和方法2會給出兩種不同不均勻的分布,而方法3則會給出一個均勻的方法。但另一方面,若直接看弦的分布,方法2的弦會看起來比較均勻,而方法1和方法3的弦則較不均勻。

隨機的弦的中點,方法1

隨機的弦的中點,方法2

隨機的弦的中點,方法3

隨機的弦,方法1

隨機的弦,方法2

隨機的弦,方法3

還可以想出許多其他的分布方法。每一種方法,其隨機的弦會比三角形的邊較長的機率都可能不一樣。

至今依舊無解。試想任一實數的『開區間』都可以對應整體實數,那麼『樣本空間』之『機率測度』能不謹慎乎?就像一個處處連續但卻處處不可微分的函数令人驚訝!

一八七二年,現代分析之父,德國的卡爾‧特奧多爾‧威廉‧魏爾斯特拉斯 Karl Theodor Wilhelm Weierstraß 給出一個處處連續但卻處處不可微分的這種非直覺性之函数︰

f(x)= \sum_{n=0} ^\infty a^n \cos(b^n \pi x),

其中 0<a<1, b 為正的奇數,使得:ab > 1+\frac{3}{2} \pi

那美麗的『科赫雪花』由連續之線段,極限而成,

一九零四年瑞典數學家尼爾斯‧法比安‧海里格‧馮‧科赫 Niels Fabian Helge von Koch 不用著魏爾施特拉斯那種抽象又解析之定義,給出了現今稱作『科赫雪花』的直觀幾何學構造,……

─── 摘自《加百利之號角!!

─── 《時間序列︰伯特蘭悖論

 

莫非 STT 遇上 TTS ,似子之矛交手子之盾耶??!!

 

PicoTTS 雖言未語!

 

GnuSpeech 胡言亂語?

※ 參讀
‧ 《 Raspbian Stretch 《六之 J.3‧MIR-3 》
‧ 《Raspbian Stretch 《六之 K.3-言語界面-3上 》

想 go forward ten meters 向前十米實不易乎!!??

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.3 》

廬山東林寺三笑庭名聯‧清‧唐蝸寄

橋跨虎溪,三教三源流,三人三笑語;
蓮開僧舍,一花一世界,一葉一如來。

曾經三人三笑語,聞得虎嘯,恍然大悟。何故三詠三抒懷︰

詠二疏‧陶淵明

大象轉四時,功成者自去。
借問衰周來,幾人得其趣?
遊目漢廷中,二疏復此舉。
高嘯返舊居,長揖儲君傅。
餞送傾皇朝,華軒盈道路。
離別情所悲,余榮何足顧!
事勝感行人,賢哉豈常譽?
厭厭閭裏歡,所營非近務。
促席延故老,揮觴道平素。
問金終寄心,清言曉未悟。
放意樂餘年,遑恤身後慮。
誰雲其人亡,久而道彌著。

詠三良‧陶淵明

彈冠乘通津,但懼時我遺;
服勤盡歲月,常恐功愈微。
忠情謬獲露,遂為君所私。
出則陪文輿,入必侍丹帷;
箴規向已從,計議初無虧。
一朝長逝後,願言同此歸。
厚恩因難忘,君命安可違?
臨穴罔惟疑,投義誌攸希。
荊棘籠高墳,黃鳥聲正悲。
良人不可贖,泫然沾我衣。

詠荊軻‧陶淵明

燕丹善養士,誌在報強嬴。
招集百夫良,歲暮得荊卿。
君子死知己,提劍出燕京;
素驥鳴廣陌,慷慨送我行。
雄發指危冠,猛氣衝長纓。
飲餞易水上,四座列群英。
漸離擊悲筑,宋意唱高聲。
蕭蕭哀風逝,淡淡寒波生。
商音更流涕,羽奏壯士驚。
心知去不歸,且有後世名。
登車何時顧,飛蓋入秦庭。
淩厲越萬裏,逶迤過千城。
圖窮事自至,豪主正怔營。
惜哉劍術疏,奇功遂不成!
其人雖已沒,千載有餘情。

晉時淵明,晉後名潛,已棄五斗米,不知姓字忘其何人,五柳先生『伍』『柳』吟誦耶??果真二三子其志一也!!雖說是移時隔空得失不同,其人其心何其相似乎??!!先生『菀柳』之『情』仍一樣吧!!??

誰雲其人亡,久而道彌著。

良人不可贖,泫然沾我衣。

其人雖已沒,千載有餘情。

大暑已過,入秋之際,講此『春耕夏耘』之『心法』勒。古今中外『學問』縱有千百種,談起功夫『心法』則一矣。往往其『志一』其『人同』也!!只是『一花』『一葉』真誠對待歟??

─── 《光的世界︰派生科學計算六‧下

 

字裡行間逐詞意,詞意或解概念難!

聲韻音節是何物?

Syllable

A syllable is a unit of organization for a sequence of speech sounds. For example, the word water is composed of two syllables: wa and ter. A syllable is typically made up of a syllable nucleus (most often a vowel) with optional initial and final margins (typically, consonants).

Syllables are often considered the phonological “building blocks” of words. They can influence the rhythm of a language, its prosody, its poetic meter and its stress patterns.

Syllabic writing began several hundred years before the first letters. The earliest recorded syllables are on tablets written around 2800 BC in the Sumerian city of Ur. This shift from pictograms to syllables has been called “the most important advance in the history of writing“.[1]

A word that consists of a single syllable (like English dog) is called a monosyllable (and is said to be monosyllabic). Similar terms include disyllable (and disyllabic; also bisyllable and bisyllabic) for a word of two syllables; trisyllable (and trisyllabic) for a word of three syllables; and polysyllable (and polysyllabic), which may refer either to a word of more than three syllables or to any word of more than one syllable.

 

初聲始發自心田!?

Onset (audio)

Onset refers to the beginning of a musical note or other sound. It is related to (but different from) the concept of a transient: all musical notes have an onset, but do not necessarily include an initial transient.

In phonetics the term is used differently – see syllable onset.

Onset detection

In signal processing, onset detection is an active research area. For example, the MIREX annual competition features an Audio Onset Detection contest.

Approaches to onset detection can operate in the time domain, frequency domain, phase domain, or complex domain, and include looking for:

Simpler techniques such as detecting increases in time-domain amplitude can typically lead to an unsatisfactorily high amount of false positives or false negatives.

The aim is often to judge onsets similarly to how a human would: so psychoacoustically-motivated strategies may be employed. Sometimes the onset detector can be restricted to a particular domain (depending on intended application), for example being targeted at detecting percussive onsets. With a narrower focus, it can be more straightforward to obtain reliable detection.

 

轉身反取坎離劍

Onset detection

onset_detect([y, sr, onset_envelope, …]) Basic onset detector.
onset_backtrack(events, energy) Backtrack detected onset events to the nearest preceding local minimum of an energy function.
onset_strength([y, sr, S, lag, max_size, …]) Compute a spectral flux onset strength envelope.
onset_strength_multi([y, sr, S, lag, …]) Compute a spectral flux onset strength envelope across multiple channels.
 

耳聰目明珠璣傳◎

※ 參讀

‧ 《【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-13.2 》
‧ 《【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-13.6 》

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.2 》

『音轉文』STT 和『轉寫』相去多遠呢?

扒帶英語:transcription,意為「轉寫」),亦稱扒譜,是指通過人對一首樂曲的反覆試聽,從而將其總譜寫下來的過程。扒帶的主要目的是恢復原譜以便學習樂曲並進行演奏。另外,扒帶也有助於根據樂曲重新製作MIDI音樂。例如目前許多手機鈴聲就是通過扒帶的方法製作出來的。

人耳對音高的敏感度及對樂器的辨別力直接決定了扒帶質量的好壞。另一方面,扒帶對人耳也是一個非常好的訓練。

──

Transcription (music)

In music, transcription can mean notating a piece or a sound which was previously unnotated, as, for example, an improvised jazz solo. When a musician is tasked with creating sheet music from a recording and they write down the notes that make up the song in music notation, it is said that they created a musical transcription of that recording. Transcription may also mean rewriting a piece of music, either solo or ensemble, for another instrument or other instruments than which it was originally intended. The Beethoven Symphonies by Franz Liszt are a good example. Transcription in this sense is sometimes called arrangement, although strictly speaking transcriptions are faithful adaptations, whereas arrangements change significant aspects of the original piece.

Further examples of music transcription include ethnomusicological notation of oral traditions of folk music, such as Béla Bartók‘s and Ralph Vaughan Williams‘ collections of the national folk music of Hungary and England respectively. The French composer Olivier Messiaen transcribed birdsong in the wild, and incorporated it into many of his compositions, for example his Catalogue d’oiseaux for solo piano. Transcription of this nature involves scale degree recognition and harmonic analysis, both of which the transcriber will need relative or perfect pitch to perform.

In popular music and rock, there are two forms of transcription. Individual performers copy a note-for-note guitar solo or other melodic line. As well, music publishers transcribe entire recordings of guitar solos and bass lines and sell the sheet music in bound books. Music publishers also publish PVG (piano/vocal/guitar) transcriptions of popular music, where the melody line is transcribed, and then the accompaniment on the recording is arranged as a piano part. The guitar aspect of the PVG label is achieved through guitar chords written above the melody. Lyrics are also included below the melody.

───

 

作過 MIR 筆記

Pitch Transcription Exercise

 

練習者,應當已知其所用分解、組合、評估方法矣。

Automatic music transcription

The term “automatic music transcription” was first used by audio researchers James A. Moorer, Martin Piszczalski, and Bernard Galler in 1977. With their knowledge of digital audio engineering, these researchers believed that a computer could be programmed to analyze a digital recording of music such that the pitches of melody lines and chord patterns could be detected, along with the rhythmic accents of percussion instruments. The task of automatic music transcription concerns two separate activities: making an analysis of a musical piece, and printing out a score from that analysis.[1]

This was not a simple goal, but one that would encourage academic research for at least another three decades. Because of the close scientific relationship of speech to music, much academic and commercial research that was directed toward the more financially resourced speech recognition technology would be recycled into research about music recognition technology. While many musicians and educators insist that manually doing transcriptions is a valuable exercise for developing musicians, the motivation for automatic music transcription remains the same as the motivation for sheet music: musicians who do not have intuitive transcription skills will search for sheet music or a chord chart, so that they may quickly learn how to play a song. A collection of tools created by this ongoing research could be of great aid to musicians. Since much recorded music does not have available sheet music, an automatic transcription device could also offer transcriptions that are otherwise unavailable in sheet music. To date, no software application can yet completely fulfill James Moorer’s definition of automatic music transcription. However, the pursuit of automatic music transcription has spawned the creation of many software applications which can aid in manual transcription. Some can slow down music while maintaining original pitch and octave, some can track the pitch of melodies, some can track the chord changes, and others can track the beat of music.

Automatic transcription most fundamentally involves identifying the pitch and duration of the performed notes. This entails tracking pitch and identifying note onsets. After capturing those physical measurements, this information is mapped into traditional music notation, i.e., the sheet music.

Digital Signal Processing is the branch of engineering that provides software engineers with the tools and algorithms needed to analyze a digital recording in terms of pitch (note detection of melodic instruments), and the energy content of un-pitched sounds (detection of percussion instruments). Musical recordings are sampled at a given recording rate and its frequency data is stored in any digital wave format in the computer. Such format represents sound by digital sampling.

 

因是想借 librosa 程式庫一探『言語』頻譜特徵呦。

Spectral features

chroma_stft([y, sr, S, norm, n_fft, …]) Compute a chromagram 可借 librosa 一探言語頻譜特徵也。from a waveform or power spectrogram.
chroma_cqt([y, sr, C, hop_length, fmin, …]) Constant-Q chromagram
chroma_cens([y, sr, C, hop_length, fmin, …]) Computes the chroma variant “Chroma Energy Normalized” (CENS), following [R15].
melspectrogram([y, sr, S, n_fft, …]) Compute a mel-scaled spectrogram.
mfcc([y, sr, S, n_mfcc]) Mel-frequency cepstral coefficients
rmse([y, S, frame_length, hop_length, …]) Compute root-mean-square (RMS) energy for each frame, either from the audio samples y or from a spectrogram S.
spectral_centroid([y, sr, S, n_fft, …]) Compute the spectral centroid.
spectral_bandwidth([y, sr, S, n_fft, …]) Compute p’th-order spectral bandwidth:
spectral_contrast([y, sr, S, n_fft, …]) Compute spectral contrast [R16]
spectral_rolloff([y, sr, S, n_fft, …]) Compute roll-off frequency
poly_features([y, sr, S, n_fft, hop_length, …]) Get coefficients of fitting an nth-order polynomial to the columns of a spectrogram.
tonnetz([y, sr, chroma]) Computes the tonal centroid features (tonnetz), following the method of [R17].
zero_crossing_rate(y[, frame_length, …]) Compute the zero-crossing rate of an audio time series.

 

且可玩味線性預測

Linear prediction

Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples.

In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed as a subset of filter theory. In system analysis (a subfield of mathematics), linear prediction can be viewed as a part of mathematical modelling or optimization.

The prediction model

The most common representation is

  {\widehat {x}}(n)=\sum _{i=1}^{p}a_{i}x(n-i)\,

where  {\widehat {x}}(n) is the predicted signal value, x(n-i) the previous observed values, and  a_{i} the predictor coefficients. The error generated by this estimate is

e(n)=x(n)-{\widehat {x}}(n)\,

where  x(n) is the true signal value.

These equations are valid for all types of (one-dimensional) linear prediction. The differences are found in the way the predictor coefficients  a_{i} are chosen.

For multi-dimensional signals the error metric is often defined as

  e(n)=\|x(n)-{\widehat {x}}(n)\|\,

where  \|\cdot \| is a suitable chosen vector norm. Predictions such as  {\widehat {x}}(n) are routinely used within Kalman filters and smoothers[1] to estimate current and past signal values, respectively.

 

之旨趣也◎