【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-6 》

王小玉說書

清‧劉鶚‧《老殘遊記

第二回 歷山山下古帝遺蹤 明湖湖邊美人絕調

停了數分鐘時,簾子裡面出來一個姑娘,約有十六七歲,長長鴨蛋臉兒,梳了一個抓髻,戴了一副銀耳環,穿了一件藍布外褂兒,一夫 朗和斐條藍布褲子,都是黑布鑲滾的。雖是粗布衣裳,到十分潔淨。來到半桌後面右手椅子上坐下。那彈弦子的便取了弦子,錚錚鏦鏦彈起。這姑娘便立起身來,左 手取了梨花簡,夾在指頭縫裡,便丁丁當當的敲,與那弦子聲音相應。右手持了鼓捶子,凝神聽那弦子的節奏。忽羯鼓一聲,歌喉遽發,字字清脆,聲聲宛轉,如新 鶯出谷,乳燕歸巢,每句七字,每段數十句,或緩或急,忽高忽低。其中轉腔換調之處,百變不窮,覺一切歌曲腔調俱出其下,以為觀止矣。

旁 坐有兩人,其一人低聲問那人道:「此想必是白妞了罷?」其一人道:「不是。這人叫黑妞,是白妞的妹子。他的調門兒都是白妞教的,若比白妞,還不曉得差多遠 呢!他的好處人說得出,白妞的好處人說不出;他的好處人學的到,白妞的好處人學不到。你想,這幾年來,好玩耍的誰不學他們的調兒呢?就是窯子裡的姑娘,也 人人都學,只是頂多有一兩句到黑妞的地步。若白妞的好處,從沒有一個人能及他十分裡的一分的。」說著的時候,黑妞早唱完,後面去了。這時滿園子裡的人,談 心的談心,說笑的說笑。賣瓜子、落花生、山裡紅、核桃仁的,高聲喊叫著賣,滿園子裡聽來都是人聲。

正 在熱鬧哄哄的時節,只見那後臺裡,又出來了一位姑娘,年紀約十八九歲,裝束與前一個毫無分別。瓜子臉兒,白淨麵皮,相貌不過中人以上之姿,只覺得秀而不 媚,清而不寒。半低著頭出來,立在半桌後面,把梨花簡了當了幾聲。煞是奇怪,只是兩片頑鐵,到他手裡,便有了五音十二律以的。又將鼓捶子輕輕的點了兩下, 方抬起頭來,向臺下一盼。那雙眼睛,如秋水,如寒星,如寶珠,如白水銀裡頭養著兩丸黑水銀,左右一顧一看,連那坐在遠遠牆角子裡的人,都覺得王小玉看見我 了,那坐得近的更不必說。就這一眼,滿園子裡便鴉雀無聲,比皇帝出來還要靜悄得多呢,連一根針跌在地下都聽得見響!

王 小玉便啟朱脣,發皓齒,唱了幾句書兒。聲音初不甚大,只覺入耳有說不出來的妙境。五臟六腑裡,像熨斗熨過,無一處不伏貼。三萬六千個毛孔,像吃了人參果, 無一個毛孔不暢快。唱了十數句之後,漸漸的越唱越高,忽然拔了一個尖兒,像一線鋼絲拋入天際,不禁暗暗叫絕。那知他於那極高的地方,尚能迴環轉折。幾囀之 後,又高一層,接連有三四疊,節節高起。恍如由傲來峰西面攀登泰山的景象,初看傲來峰削壁千仞,以為上與天通。及至翻到傲來峰頂,才見扇子崖更在傲來峰 上。及至翻到扇子崖,又見南天門更在扇子崖上。愈翻愈險,愈險愈奇。

那 王小玉唱到極高的三四疊後,陡然一落,又極力騁其千迴百折的精神,如一條飛蛇在黃山三十六峰半中腰裡盤旋穿插。頃刻之間,周匝數遍。從此以後,愈唱愈低, 愈低愈細,那聲音漸漸的就聽不見了。滿園子的人都屏氣凝神,不敢少動。約有兩三分鐘之久,彷彿有一點聲音從地底下發出。這一出之後,忽又揚起,像放那東洋 煙火,一個彈子上天,隨化作千百道五色火光,縱橫散亂。這一聲飛起,即有無限聲音俱來並發。那彈弦子的亦全用輪指,忽大忽小,同他那聲音相和相合,有如花 塢春曉,好鳥亂鳴。耳朵忙不過來,不曉得聽那一聲的為是。正在撩亂之際,忽聽霍然一聲,人弦俱寂。這時臺下叫好之聲,轟然雷動。

停 了一會,鬧聲稍定,只聽那臺下正座上,有一個少年人,不到三十歲光景,是湖南口音,說道:「當年讀書,見古人形容歌聲的好處,有那『餘音繞梁,三日不絕』 的話,我總不懂。空中設想,餘音怎樣會得繞梁呢?又怎會三日不絕呢?及至聽了小玉先生說書,才知古人措辭之妙。每次聽他說書之後,總有好幾天耳朵裡無非都 是他的書,無論做什麼事,總不入神,反覺得『三日不絕』,這『三日』二字下得太少,還是孔子『三月不知肉味』,『三月』二字形容得透徹些!」旁邊人都說 道:「夢湘先生論得透闢極了!『於我心有戚戚焉』!」

說 著,那黑妞又上來說了一段,底下便又是白妞上場。這一段,聞旁邊人說,叫做「黑驢段」。聽了去,不過是一個士子見一個美人,騎了一個黑驢走過去的故事。將 形容那美人,先形容那黑驢怎樣怎樣好法,待鋪敘到美人的好處,不過數語,這段書也就完了。其音節全是快板,越說越快。白香山詩云:「大珠小珠落玉盤。」可 以盡之。其妙處在說得極快的時候,聽的人彷彿都趕不上聽,他卻字字清楚,無一字不送到人耳輪深處。這是他的獨到,然比著前一段卻未免遜一籌了。


詩經毛詩序

情發於聲,聲成文謂之音,治世之音安以樂,其政和;亂世之音怨以怒,其政乖;亡國之音哀以思,其民困故正得失,動天地,感鬼神,莫近於詩。先王以是經夫婦,成孝敬,厚人倫,美教化,移風俗 。

風聲雨聲讀書聲雖然都是『』,但不知有幾人能詮釋『地籟』之『』;或許『誦讀聲』偶然入耳,聽之卻有『弦外之音』。終於『寰宇的振動』一分為三,成為了『自然之聲』、『言語之音』以及『動人之樂』!王小玉說書,字字清晰詞詞明白,音似行雲且聲若流水,一時雷鳴九霄之外,忽而泉湧九地之下,彼音擬樂此聲知音,相追相逐鎔鑄成了『天籟』的聲樂旋律!!

……

音樂
音樂聲波

聲音頻譜
聲音頻譜

聲音合成器
260px-Sine_waves_different_frequencies.svg

160px-Karplus-strong-schematic

213px-ADSR_parameter.svg

220px-Mixtur_Trautonium

220px-Moog_Modular_55_img2

超聲波影像
Aorta_duplication_artifact_131206105958250c

假使換個角度從頻率上看,最早被人們所認識的聲波當然是人耳能夠聽到的『可聞音』,這可關係到了『語言』、『音樂』、『樂器』、『空間音質』與『噪音』等等,它們分别對應著『語言聲學』、『音樂聲學』、『樂器聲學』、『聲場聲學』以及『噪音控制』種種。然後又及於『聽覺』的『生理、心理聲學』 ,並隨著一八八零年法國物理學家皮埃爾‧居禮 Pierre Curie 和雅克‧居禮 Jacques Curie 兄弟發現『電氣石』具有『壓電效應』,開啟了聲波頻率超過 20 kHz 的『超聲波』之大門。當聲波頻率再超過 500 MHz 稱為『特超聲』,它的『波長』已經可以與『分子』大小相比擬,它的研究就叫做『分子聲學』。反過來講當聲波頻率低於 20 Hz 有『次聲學』,用以研究『火山爆發』或者『流星爆炸』所產生的『聲重力波』。也可以說『物理聲學』正與眾多學科領越交叉融會,匯聚成洋洋大觀的『科技前延』,果真是既古又新的啊 !!

然而電子『聲音合成器』的發展歷史雖不可能早於一八二七年德國物理學家蓋歐格‧西蒙‧歐姆 Georg Simon Ohm  在《直流電路的數學研究》一文中『歐姆定律 』的發表,如今卻已經很難追遡!現今所說的『合成器』Synthesizer,是利用多種『電子技術』 ── 比方說,加法合成、減法合成、FM、相位調變… ──,或者使用『物理模型』發聲的『電子樂器』── 也常稱作鍵盤樂器 ──。

『Sonic π』的發聲軟體的核心就是一種『軟體合成器』,使用樹莓派『模擬』了二十三種『聲音合成』的方式,採用『樂器數位介面』MIDI  Musical Instrument Digital Interface 的描述碼來表達『音符』。同時這個軟體合成器對於一個『聲音』的發聲控制,採取了一般常用的『ADSR』Attack-Decay-Sustain-Release 波封機制。

 

─── 發音的輕重緩急和抑揚頓挫表達著人聲之美 ───

── 《【Sonic π】描摹聲音

 

其實樂譜中之豆芽菜, MIDI 裡的數字碼,都是紀錄音樂的符號︰

Symbolic music representations comprise any kind of score representation with an explicit encoding of notes or other musical events. These include machine-readable data formats such as MIDI. Any kind of digital data format may be regarded as symbolic since it is based on a finite alphabet of letters or symbols.

 

方便人或是機器重現曲調乎?

MIDI

MIDI (/ˈmɪdi/; short for Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface and electrical connectors and allows a wide variety of electronic musical instruments, computers and other related music and audio devices to connect and communicate with one another.[1] A single MIDI link can carry up to sixteen channels of information, each of which can be routed to a separate device.

MIDI carries event messages that specify notation, pitch and velocity (loudness or softness), control signals for parameters such as volume, vibrato, audio panning from left to right, cues in theatre, and clock signals that set and synchronize tempo between multiple devices. These messages are sent via a MIDI cable to other devices where they control sound generation and other features. A simple example of a MIDI setup is the use of a MIDI controller such as an electronic musical keyboard to trigger sounds created by a sound module, which is in turn plugged into a keyboard amplifier. This MIDI data can also be recorded into a hardware or software device called a sequencer, which can be used to edit the data and to play it back at a later time.[2]:4

Advantages of MIDI include small file size, ease of modification and manipulation and a wide choice of electronic instruments and synthesizer or digitally-sampled sounds.[3] Prior to the development of MIDI, electronic musical instruments from different manufacturers could generally not communicate with each other. With MIDI, any MIDI-compatible keyboard (or other controller device) can be connected to any other MIDI-compatible sequencer, sound module, drum machine, synthesizer, or computer, even if they are made by different manufacturers.

MIDI technology was standardized in 1983 by a panel of music industry representatives, and is maintained by the MIDI Manufacturers Association (MMA). All official MIDI standards are jointly developed and published by the MMA in Los Angeles, and the MIDI Committee of the Association of Musical Electronics Industry (AMEI) in Tokyo. In 2016, the MMA established The MIDI Association (TMA) to support a global community of people who work, play, or create with MIDI.[4]

MIDI allows multiple instruments to be played from a single controller (often a keyboard, as pictured here), which makes stage setups much more portable. This system fits into a single rack case, but prior to the advent of MIDI, it would have required four separate full-size keyboard instruments, plus outboard mixing and effects units.

 

所以 librosa 亦十分看重耶!

Core IO and DSP

Time and frequency conversion

frames_to_samples(frames[, hop_length, n_fft]) Converts frame indices to audio sample indices
frames_to_time(frames[, sr, hop_length, n_fft]) Converts frame counts to time (seconds)
samples_to_frames(samples[, hop_length, n_fft]) Converts sample indices into STFT frames.
samples_to_time(samples[, sr]) Convert sample indices to time (in seconds).
time_to_frames(times[, sr, hop_length, n_fft]) Converts time stamps into STFT frames.
time_to_samples(times[, sr]) Convert timestamps (in seconds) to sample indices.
hz_to_note(frequencies, **kwargs) Convert one or more frequencies (in Hz) to the nearest note names.
hz_to_midi(frequencies) Get the closest MIDI note number(s) for given frequencies
midi_to_hz(notes) Get the frequency (Hz) of MIDI note(s)
midi_to_note(midi[, octave, cents]) Convert one or more MIDI numbers to note strings.
note_to_hz(note, **kwargs) Convert one or more note names to frequency (Hz)
note_to_midi(note[, round_midi]) Convert one or more spelled notes to MIDI number(s).
hz_to_mel(frequencies[, htk]) Convert Hz to Mels
hz_to_octs(frequencies[, A440]) Convert frequencies (Hz) to (fractional) octave numbers.
mel_to_hz(mels[, htk]) Convert mel bin numbers to frequencies
octs_to_hz(octs[, A440]) Convert octaves numbers to frequencies.
fft_frequencies([sr, n_fft]) Alternative implementation of np.fft.fftfreqs
cqt_frequencies(n_bins, fmin[, …]) Compute the center frequencies of Constant-Q bins.
mel_frequencies([n_mels, fmin, fmax, htk]) Compute the center frequencies of mel bands.
tempo_frequencies(n_bins[, hop_length, sr]) Compute the frequencies (in beats-per-minute) corresponding to an onset auto-correlation or tempogram matrix.

 

然而莫忘,不轉化為聲音,終究無法聽聞也◎

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-5 》

Music can be represented in many different ways. The printed, visual form of a musical work is called a score or sheet music. For example, here is a sheet music excerpt from Mozart Piano Sonata No. 11 K. 331:

Sheet music consists of notes. A note has several properties including pitch, timbre, loudness, and duration.

Pitch (Wikipedia) is a perceptual property that indicates how “high” or “low” a note sounds. Pitch is closely related to the fundamental frequency sounded by the note, although fundamental frequency is a physical property of the sound wave.

An octave (Wikipedia) is an interval between two notes where the higher note is twice the fundamental frequency of the lower note. For example, an A at 440 Hz and an A at 880 Hz are separated by one octave. Here are two Cs separated by one octave:

 

循名責實雖然辛苦,卻是果實豐碩也!想當日無有如今方便網際網路呦?因此難得這樣勤快哩!?故耳退而解之為

隨時開始學習都可矣。

或許曾經浸蘊視覺主觀性太久︰

GoPiGo 小汽車︰格點圖像算術《色彩空間》主觀性…

聽覺亦耳順的耶??!!

於是敢借『詞條意義』敷衍乎!!??

音符

音符是西方音樂的基本元素,將音樂打散成它的最小組成,讓人們得以演奏、理解和分析。在音樂中有幾個主要的意義:用來表示相對長度的固定音高單位;樂譜中表達前述的單位的圖示;代表某一個音高的聲音。

[來源請求]音樂家常常隨意混用這兩種意義,但是對剛開始進入音樂領域的人們而言,常常因此造成混淆。以《生日快樂歌》作為例子,我們可以說「這首歌由兩個同音高的音符開始」,或者是「這個作品由重複同一個音符開始」。前面這個說法裡,音符用來表示一個特定的音樂事件:單獨且擁有長度的固定音高單位;對於後者而言,它代表一個音樂事件分類,只要是同音高者都在此分類中。

音名

兩個音符之間若頻率相差整數倍,則聽起來非常相似。因此,我們將這些音放在同一個音高集合pitch class)中。兩個音符間若相差一倍的頻率,則我們稱兩者之間相差一個八度要完整描述一個音符,則必須同時說出它的類別以及它在哪個八度之中。在傳統音樂理論中,我們使用前七個拉丁字母:A、B、C、D、E、F、G(按此順序則音高循序而上)以及一些變化(詳情請見下文)來標示不同的音符。這些字母名字不斷的重複,在G上面又是A(比起前一個A高八度)。為了標示同名(在同一個音高集合中)但不同高度的音符,科學音調記號法(scientific pitch notation)利用字母及一個用來表示所在八度的阿拉伯數字,明確指出音符的位置。比如說,現在的標準調音音高440赫茲名為A4,往上高八度則為A5,繼續向上可無限延伸;至於A4往下,則為A3、A2…。傳統上,八度的數字標注由C音符開始,結束於B。舉例而言,C4上方的D為D4,而C4下方的B則為B3(也就是說,兩者在不同的八度內)。另外一種的標示法稱為絕對音名,這種標示方法是以C-B為一組(C、D、E、F、G、A、B),現在的標準調音音高為a1中央C則是c1,而往下一個八度為c,再往下一個八度則為大寫的C,繼續往下則是C1、C2…等。而從中央C繼續往上八度則是c2、c3、c4等。

 

音高

音高英語:pitch)在音樂領域裡指的是人類心理對音符基頻之感受 。

對音高的感知

雖然不同樂器的頻譜不同,但任何樂器演奏中央C上的A音符基頻皆為440Hz,因此所感受之音高皆同。此外,即使頻率有些許改變,聽者感受之音高未必改變,但若音高改變通常意味頻率亦改變。事實上,最小可覺差just noticeable difference,此為一個臨界值,指可被感受到的音高變化量)大約等於五音分(也就大約等於半音的百分之五),但是其會隨著人耳可聽頻率的不同而改變,且同時比較兩個音高會更為精確。一般,某些相似的音高亦會迷惑聽覺系統,導致聽覺錯覺產生。其中的例子包括三全音矛盾Shepard scale

標準音高

中央C上之A音符發出的頻率為440Hz(表示成”A=440Hz”,或是”A440″),通常被當作「標準音高」。但歷史上並非一開始就是以A440做為標準音高(參見歷史上的音高標準)。而音高通常是人類對音樂最基本的觀點。

音高標記

音高通常使用科學音高記號法或使用結合字母與數字(用以表示基頻)而成的記錄法。舉例而言,”A4″或”A440″都用來表示中央C上的A音符。然而,這樣的記譜法會造成兩個麻煩。首先,在西方十二平均律中,一個音的稱呼法並不是唯一的,比如”重升G4″所指的音高其實就是”A4″。另外,人類對音高的感受與基頻成對數性的:對人耳而言,”A220″到”A440″之間的差距跟”A440″到A880″之間相同。

為了避免這些問題,音樂理論家有時候利用數位尺度,將一個數字與基頻之間的對數關係表達一個音的音高。比方說,我們可以由廣為使用的MIDI標準,將基頻   f 對應成一數字   p

 p = 69 + 12\times\log_2 { \left(\frac {f}{440} \right) }

當然我們可也可用這數字   p 由下列的方程式轉換回基頻   f

 f = 440 \times 2^{(p-69)/12}

此方程式創造了一線性的音高空間,每一個八度大小都是12,半音(在鋼琴上相鄰的兩個鍵所擁有的音程) 之間則相差1,至於”A440″的號碼則指定為69。在這個空間中的距離與心理學實驗得到的音樂距離相符,而且這個表示法也被音樂家接受。這個系統具有一 定程度的彈性,可以用來表示一個在標準鋼琴鍵盤上不存在的音。例如,若要表示C(60)與C#(61)中間的音高時,我們可以標示為60.5。

改變音高

音高可以由多種不同的性質,如高或低、斷或續、是否隨時間改變(稱為啁啾,若有,則以何種方式改變,如滑奏滑音震音等)以及可定或不定…等來定義。在音樂上,音高與其他音高之間的關係比起音高本身的頻率多少來得重要。兩個音的關係可以用比例或者是之間的頻率差距(以分表示)來代表。可以明確感受到這些關係的人稱為擁有相對音感,至於能夠感知一音高的頻率高低而不假其他音高的人則被稱為擁有絕對音感

音色

音色是聲音的特色。不同音色的聲音,即使在同一音高和同一響度的情況下,也能讓人區分開來。同樣的音高響度配上不同的音色就好比同樣色度明度配上不同的色相一樣。

定義

美國國家標準協會將音色定義為「……一種感官屬性,使聽者可以根據它判斷出兩個具有相同的響度音高的 音是不相似的。(that attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar)」對1960年的定義的注釋中(第45頁)增加了「音色主要決定於聲音頻譜對人的刺激,但也決定於波形聲壓頻譜的頻率位置和頻譜對人的時間性刺激。

原理

聲音是由發聲的物體震動產生的。當發聲物體的主體震動時會發出一個基音,同時其餘各部分也有複合的震動,這些震動組合產生泛音。正是這些泛音決定了發生物體的音色,使人能辨別出不同的樂器甚至不同的人發出的聲音。所以根據音色的不同可以劃分出男音和女音;高音、中音和低音;弦樂和管樂等。

所有泛音都比基音的頻率高,但強度都相當弱,否則就無法調準樂器的音高了。

拍 (音樂)

在音樂的實踐及理論中,拍(子)是時間的基本單位,是拍層[1] [2]上的脈衝。拍子一般是人們聽音樂時腳尖頓地的節奏,或是樂手表演時數的數字(數數其實理論上不一定正確,往往數成最近的一層合拍) 。作為俗稱,「拍子」一詞可指速度節拍、特定的節奏律動感等一系列相關的概念。

音樂中的節奏以重拍和非重拍(通常稱作「強拍」和「弱拍」)的重複序列為特徵,並依照拍號和速度的指示分割成小節

比拍層更快的節拍等級是分拍層,更慢的是合拍層。拍子自古以來是音樂的重要元素。有些音樂流派,如迪斯科,一般會削弱拍子的重要性,而放克則強調拍子,以便配合舞蹈。[3]

節拍的層級:中間是拍層,上面是分拍層,下面是合拍層。

速度 (音樂)

{Otheruses|BPM|other=同名或類似名的其它條目描述} [[File:Arpeggio in C major.png|thumb|400px|傳統音樂的五線譜,圖中的「Allegro(Quarter note with upper stem.png=120)」標示了其音樂速度,120是BPM值,表示每分鐘演奏120個四分音符,即每個四分音符的長度等於1分鐘除以120等分=0.5秒,拍子記號是4/4拍,1小節就是0.5秒乘4拍=2秒長 速度(lang|en|tempo)決定了一段音樂的快慢,是音樂的重要元素,亦影響作品的情感與演奏難度。「tempo」是義大利語的「時間」,源於拉丁語的「tempus」。

量度音樂速度

「Wittner」樣式的電子節拍器,旋轉轉盤將指針調校到需要的BPM上使用。

「Seth Thomas」樣式,傳統機械彈簧式的節拍器。

音樂速度一般以文字或數字標記於樂曲的開端,現代習慣以每分鐘多少拍(beats per minute,BPM)作量度單位,這表示一個指定的音符,例如四分音符在一分鐘內出現的次數,BPM的數值越大代表越快的速度。

電子數碼音樂MIDI及其它電腦音樂序列程式的檔案及界面都應用了BPM來表示速度。

 

倘說了知其中『感覺有我』乙事,自曉『心物同源』萬法,己能不知 IPython 之所以重『視聽呈現』功能勒☆

 

Important

This documentation covers IPython versions 6.0 and higher. Beginning with version 6.0, IPython stopped supporting compatibility with Python versions lower than 3.3 including all versions of Python 2.7.

If you are looking for an IPython version compatible with Python 2.7, please use the IPython 5.x LTS release and refer to its documentation (LTS is the long term support release).

Module: display

Public API for display tools in IPython.

 

 

 

方特舉經驗為師之科學精神矣◎

Shepard tone

A Shepard tone, named after Roger Shepard, is a sound consisting of a superposition[disambiguation needed] of sine waves separated by octaves. When played with the bass pitch of the tone moving upward or downward, it is referred to as the Shepard scale. This creates the auditory illusion of a tone that continually ascends or descends in pitch, yet which ultimately seems to get no higher or lower.[1]

A spectrum view of ascending Shepard tones on a linear frequency scale.

 

 

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-4 》

俗話講︰隔行如隔山!為什麼呢?若以人能應用工具而言,實費疑猜乎 !?果真各行各業有『奇巧裝置』耶?!恐是落在規矩和行話迷霧哩 ??!!故耳過去雖曾註解神經網絡與深度學習乙事︰

對一本小巧完整而且寫的好的書,又該多說些什麼的呢?於是幾經思慮,就講講過去之閱讀隨筆與念頭雜記的吧!終究面對一個既舊也新的議題,尚待火石電光激發創意和發想。也許只需一個洞見或將改變人工智慧的未來乎??

Michael Nielsen 先生開宗明義在首章起頭

The human visual system is one of the wonders of the world. Consider the following sequence of handwritten digits:

Most people effortlessly recognize those digits as 504192. That ease is deceptive. In each hemisphere of our brain, humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. And yet human vision involves not just V1, but an entire series of visual cortices – V2, V3, V4, and V5 – doing progressively more complex image processing. We carry in our heads a supercomputer, tuned by evolution over hundreds of millions of years, and superbly adapted to understand the visual world. Recognizing handwritten digits isn’t easy. Rather, we humans are stupendously, astoundingly good at making sense of what our eyes show us. But nearly all that work is done unconsciously . And so we don’t usually appreciate how tough a problem our visual systems solve.

The difficulty of visual pattern recognition becomes apparent if you attempt to write a computer program to recognize digits like those above. What seems easy when we do it ourselves suddenly becomes extremely difficult. Simple intuitions about how we recognize shapes – “a 9 has a loop at the top, and a vertical stroke in the bottom right” – turn out to be not so simple to express algorithmically. When you try to make such rules precise, you quickly get lost in a morass of exceptions and caveats and special cases. It seems hopeless.

Neural networks approach the problem in a different way. The idea is to take a large number of handwritten digits, known as training examples,

 

and then develop a system which can learn from those training examples. In other words, the neural network uses the examples to automatically infer rules for recognizing handwritten digits. Furthermore, by increasing the number of training examples, the network can learn more about handwriting, and so improve its accuracy. So while I’ve shown just 100 training digits above, perhaps we could build a better handwriting recognizer by using thousands or even millions or billions of training examples.

In this chapter we’ll write a computer program implementing a neural network that learns to recognize handwritten digits. The program is just 74 lines long, and uses no special neural network libraries. But this short program can recognize digits with an accuracy over 96 percent, without human intervention. Furthermore, in later chapters we’ll develop ideas which can improve accuracy to over 99 percent. In fact, the best commercial neural networks are now so good that they are used by banks to process cheques, and by post offices to recognize addresses.

We’re focusing on handwriting recognition because it’s an excellent prototype problem for learning about neural networks in general. As a prototype it hits a sweet spot: it’s challenging – it’s no small feat to recognize handwritten digits – but it’s not so difficult as to require an extremely complicated solution, or tremendous computational power. Furthermore, it’s a great way to develop more advanced techniques, such as deep learning. And so throughout the book we’ll return repeatedly to the problem of handwriting recognition. Later in the book, we’ll discuss how these ideas may be applied to other problems in computer vision, and also in speech, natural language processing, and other domains.

Of course, if the point of the chapter was only to write a computer program to recognize handwritten digits, then the chapter would be much shorter! But along the way we’ll develop many key ideas about neural networks, including two important types of artificial neuron (the perceptron and the sigmoid neuron), and the standard learning algorithm for neural networks, known as stochastic gradient descent. Throughout, I focus on explaining why things are done the way they are, and on building your neural networks intuition. That requires a lengthier discussion than if I just presented the basic mechanics of what’s going on, but it’s worth it for the deeper understanding you’ll attain. Amongst the payoffs, by the end of the chapter we’ll be in position to understand what deep learning is, and why it matters.

………

說明這本書的主旨。是用『手寫阿拉伯數字辨識』這一主題貫串『神經網絡』以及『深度學習』之點滴,希望讀者能夠藉著最少的文本一窺全豹、聞一知十。因此他盡量少用『數學』,盡可能白話描述重要的『原理』與『概念』。

─── 摘自《W!o+ 的《小伶鼬工坊演義》︰神經網絡與深度學習【發凡】

 

心中篤定啊◎

縱也曾寫過

W!o+ 的《小伶鼬工坊演義》︰神經網絡【FFT】一

若干快速傅立葉變換相關視、聽小品,至今怕讀樂譜以及豆芽菜

的呦 !! ??

因此非常樂於推薦 CCRMA

Julius Orion Smith III

Home Page

Online Books

  1. Mathematics of the Discrete Fourier Transform (DFT)
  2. Introduction to Digital Filters
  3. Physical Audio Signal Processing
  4. Spectral Audio Signal Processing

All Publications in Chronological Order

 

先生公開之線上書也☆

MATHEMATICS OF THE DISCRETE FOURIER TRANSFORM (DFT)WITH AUDIO APPLICATIONS

SECOND EDITION

JULIUS O. SMITH III
Center for Computer Research in Music and Acoustics (CCRMA)


Preface

The Discrete Fourier Transform (DFT) can be understood as a numerical approximation to the Fourier transform. However, the DFT has its own exact Fourier theory, which is the main focus of this book. The DFT is normally encountered in practice as a Fast Fourier Transform (FFT)–i.e., a high-speed algorithm for computing the DFT. FFTs are used extensively in a wide range of digital signal processing applications, including spectrum analysis, high-speed convolution (linear filtering), filter banks, signal detection and estimation, system identification, audio compression (e.g., MPEG-II AAC), spectral modeling sound synthesis, and many other applications; some of these will be discussed in Chapter 8.

This book started out as a series of readers for my introductory course in digital audio signal processing that I have given at the Center for Computer Research in Music and Acoustics (CCRMA) since 1984. The course was created primarily for entering Music Ph.D. students in the Computer Based Music Theory program at CCRMA. As a result, the only prerequisite is a good high-school math background, including some calculus exposure.

……

SPECTRAL AUDIO SIGNAL PROCESSING

JULIUS O. SMITH III
Center for Computer Research in Music and Acoustics (CCRMA)

Preface

This book precipitated from my “spectral modeling” course which has been offered at the Center for Computer Research in Music and Acoustics (CCRMA) since 1984. The course originally evolved as a dissemination vehicle for spectral-oriented signal-processing research in computer music, aimed at beginning graduate students in computer music and engineering programs et al. Over the years it has become more of a tour of fundamentals in spectral audio signal processing, with occasional mention and citation of prior and ongoing related research. In principle, the only prerequisites are the first two books in the music signal processing series [264,263].

The focus of this book is on spectral modeling applied to audio signals. More completely, the principal tasks are spectral analysis, modeling, and resynthesis (and/or effects). We analyze sound in terms of spectral models primarily because this is what the human brain does. We may synthesize/modify sound in terms of spectral models for the same reason.

The primary tool for audio spectral modeling is the short-time Fourier transform (STFT). The applications we will consider lie in the fields of audio signal processing and musical sound synthesis and effects.

The reader should already be familiar with the Fourier transform and elementary digital signal processing. One source of this background material is [264]. Some familiarity with digital filtering and associated linear systems theory, e.g., on the level of [263], is also assumed.

There is a notable absence in this book of emphasis on audio coding of spectral representations. While audio coding is closely related, there are other books which cover this topic in detail (e.g., [273,16,159]). On the other hand, comparatively few works address applications of spectral modeling in areas other than audio compression. This book attempts to help fill that gap.

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-3 》

通常熟悉教材思路模式利於掌握學習進入狀況。故此先講

輸入‧處理‧輸出模型︰

IPO model

The input–process–output (IPO) model, or input-process-output pattern, is a widely used approach in systems analysis and software engineering for describing the structure of an information processing program or other process. Many introductory programming and systems analysis texts introduce this as the most basic structure for describing a process.[1][2][3][4]

Overview

A computer program or any other sort of process using the input-process-output model receives inputs from a user or other source, does some computations on the inputs, and returns the results of the computations.[1] In essence the system separates itself from the environment, thus defining both inputs and outputs, as one united mechanism.[5] The system would divide the work into two categories:

  • A requirement from the environment (input)
  • A provision for the environment (output)

In other words, such inputs may be materials, human resources, money or information, transformed into outputs, such as consumables, services, new information or money.

As a consequence, Input-Process-Output system becomes very vulnerable to misinterpretation. This is because, theoretically, it contains all the data, in regards to the environment outside the system, yet on practice, environment contains a significant variety of objects, that a system is unable to comprehend, as it exists outside systems control. As a result it is very important, to understand, where the boundary lies, between the system and the environment, which is beyond systems understanding. This is because, often various analysts, would set their own boundaries, favouring their point of view, thus creating much confusion.[6]

 

指明其與 IPython

讀取‧求值‧輸出循環

Read–eval–print loop

A Read–Eval–Print Loop (REPL), also known as an interactive toplevel or language shell, is a simple, interactive computer programming environment that takes single user inputs (i.e. single expressions), evaluates them, and returns the result to the user; a program written in a REPL environment is executed piecewise. The term is most usually used to refer to programming interfaces similar to the classic Lisp machine interactive environment. Common examples include command line shells and similar environments for programming languages, and is particularly characteristic of scripting languages.[1]

Uses

As a shell, a REPL environment allows users to access relevant features of an operating system in addition to providing access to programming capabilities.

The most common use for REPLs outside of operating system shells is for instantaneous prototyping. Other uses include mathematical calculation, creating documents that integrate scientific analysis (e.g. IPython), interactive software maintenance, benchmarking, and algorithm exploration.

A REPL can become an essential part of learning a new language as it gives quick feedback to the novice.

 

密切相關也。這可說是

Jupyter Audio Basics》文章綱要哩!

舉例而言︰

聲音檔輸入

import librosa
x, sr = librosa.load(‘audio/simple_loop.wav’)

 

視覺化輸出

%matplotlib inline
import seaborn # optional
import matplotlib.pyplot as plt
import librosa.display

plt.figure(figsize=(12, 4))
librosa.display.waveplot(x, sr=sr)

 

stft 處理及顯示

X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(X)
plt.figure(figsize=(12, 5))
librosa.display.specshow(Xdb, sr=sr, x_axis=’time’, y_axis=’hz’)

 

聽覺化輸出

import IPython.display as ipd
ipd.Audio(‘audio/simple_loop.wav’) # load a local WAV file

 

 

 

還請讀者自讀原文,品嚐玩味的吧!

僅補之以

System shell commands

To run any command at the system shell, simply prefix it with !, e.g.:

!ping www.bbc.co.uk

You can capture the output into a Python list, e.g.: files = !ls. To pass the values of Python variables or expressions to system commands, prefix them with $: !grep -rF $pattern ipython/*. See our shell section for more details.

Define your own system aliases

It’s convenient to have aliases to the system commands you use most often. This allows you to work seamlessly from inside IPython with the same commands you are used to in your system shell. IPython comes with some pre-defined aliases and a complete system for changing directories, both via a stack (see %pushd, %popd and %dhist) and via direct %cd. The latter keeps a history of visited directories and allows you to go to any previously visited one.

 

之說明及用法,了其方便性呦◎

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-2 》

雖然有了軟件工具,面對大量 MIR 材料,實難落筆,不知打哪講起哩?心想反正自己亦是新手,何不就隨手寫點學習筆記摘要鏈結吧!或許於人有益也說不定?!

如果讀過簡介

 

多遍以上,將會清楚知道這是以派生 Python 語言為核心,多種程式庫為輔翼,藉著 Jupyter 互動環境,談論音樂資訊檢索的方方面面︰

1. About This Site

About This Site

musicinformationretrieval.com is a collection of instructional materials for music information retrieval (MIR). These materials contain a mix of casual conversation, technical discussion, and Python code.

These pages, including the one you’re reading, are authored using Jupyter notebooks. They are statically hosted using GitHub Pages. The GitHub repository is found here: stevetjoa/stanford-mir.

This material is used during the annual Summer Workshop on Music Information Retrieval at CCRMA, Stanford University. Click here for workshop description and registration.

This site is maintained by Steve Tjoa. For questions, please email steve@stevetjoa.com. Do you have any feedback? Did you find errors or typos? Are you a teacher or researcher and would like to collaborate? Please let me know.

 

2. What is MIR?

While you listen to these excerpts, name as many of its musical characteristics as you can. Can you name the genre? tempo? instruments? mood? time signature? key signature? chord progression? tuning frequency? song structure?

 

What is MIR?

Here is a sampling of tasks found in music information retrieval:

  • fingerprinting
  • cover song detection
  • genre recognition
  • transcription
  • recommendation
  • symbolic melodic similarity
  • mood
  • source separation
  • instrument recognition
  • pitch tracking
  • tempo estimation
  • score alignment
  • song structure/form
  • beat tracking
  • key detection
  • query by humming

 

Why MIR?

  • discover, organize, monetize media collections
  • search (“find me something that sounds like this”) songs, loops, speech, environmental sounds, sound effects
  • workflows in consumer products through machine hearing
  • automatic control of software and mobile devices

 

How is MIR done?

Well, that’s a big question. Two primary areas in music analysis include tonal analysis (e.g. melody and harmony) and rhythm and tempo (e.g. beat tracking). Here are some great overviews by Meinard Müller (author, FMP) on both topics.

 

3. Python Basics

Python Basics

Why Python?

Python is a general-purpose programming language that is popular and easy to use. For new programmers, it is a great choice as a first programming language. In fact, more and more university CS departments are centering their introductory courses around Python.

For a summary of reasons to move from Matlab to Python, please read this post.

This page on Udacity provides some more great reasons to use Python, along with resources for getting started.

 

4. Jupyter Basics

Jupyter Basics

You are looking at a Jupyter Notebook, an interactive Python shell inside of a web browser. With it, you can run individual Python commands and immediately view their output. It’s basically like the Matlab Desktop or Mathematica Notebook but for Python.

To start an interactive Jupyter notebook on your local machine, read the instructions at the GitHub README for this repository.

If you are reading this notebook on http://musicinformationretrieval.com, you are viewing a read-only version of the notebook, not an interactive version. Therefore, the instructions below do not apply.

 

5. Jupyter Audio Basics

Audio Libraries

We will mainly use two libraries for audio acquisition and playback:

1. librosa

librosa is a Python package for music and audio processing by Brian McFee. A large portion was ported from Dan Ellis’s Matlab audio processing examples.

2. IPython.display.Audio

IPython.display.Audio lets you play audio directly in an IPython notebook.

 

6. NumPy and SciPy Basics

NumPy and SciPy

 

The quartet of NumPy, SciPy, Matplotlib, and IPython is a popular combination in the Python world. We will use each of these libraries in this workshop.

 

7. Alphabetical Index of Terms

Alphabetical Index of Terms

Term musicinformationretrieval.com Wikipedia librosa FMP Related
Energy Energy and RMSE Energy (signal processing)   66, 67 Root-mean-square energy
Term musicinformationretrieval.com Wikipedia librosa FMP Related
Root-mean-square energy Energy and RMSE Root mean square librosa.feature.rmse   Energy
Spectrogram STFT and Spectrogram Spectrogram   29, 55 STFT
Short-time Fourier transform

 

故而前行者盡快先能掌握的呦!!??