【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-7.2G 》

處於『雜訊』以及『干擾』之世界中,即使打個『招呼』︰

 

都將與『統計』和『不確定性』為伍耶!?

因是在『【鼎革‧革鼎】…』篇章結束之前,特說『學後而識』的『重要性』,或連『機器』亦不可免乎?!

……

We can appreciate why we need additional intelligence in our systems — heuristics don’t go very far in the world of complex audio signals. We’ll be using scikit-learn’s implementation of the k-NN algorithm for our work here. It proves be a straightforward and easy-to-use implementation. The steps and skills of working with one classifier will scale nicely to working with other, more complex classifiers.

 

揣想是否能借『改寫』派生二之

 

A Python library for audio feature extraction, classification, segmentation and applications

This doc contains general info. Click [here] (https://github.com/tyiannak/pyAudioAnalysis/wiki) for the complete wiki

General

pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:

  • Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
  • Classify unknown sounds
  • Train, parameter tune and evaluate classifiers of audio segments
  • Detect audio events and exclude silence periods from long recordings
  • Perform supervised segmentation (joint segmentation – classification)
  • Perform unsupervised segmentation (e.g. speaker diarization)
  • Extract audio thumbnails
  • Train and use audio regression models (example application: emotion recognition)
  • Apply dimensionality reduction to visualize audio data and content similarities

 

『程式庫』至派生三的『經驗』,得到啟發也◎

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-7.2F 》

登高望遠回首來時路,或許方向看的更清楚吧!?

 

說此『樹莓派挾泰山以超北海』之事︰

Project DeepSpeech

Project DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques, based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow project to make the implementation easier.

 

Pre-built binaries that can be used for performing inference with a trained model can be installed with pip. Proper setup using virtual environment is recommended and you can find that documented below.

A pre-trained English model is available for use, and can be downloaded using the instructions below.

Once everything is installed you can then use the deepspeech binary to do speech-to-text on short, approximately 5 second, audio files (currently only WAVE files with 16-bit, 16 kHz, mono are supported in the Python client):

pip install deepspeech
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

Alternatively, quicker inference (The realtime factor on a GeForce GTX 1070 is about 0.44.) can be performed using a supported NVIDIA GPU on Linux. (See the release notes to find which GPU’s are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).

 

冀免眾裡尋他千百度也?!

 

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-7.2E 》

為什麼人會有興趣分析『雜訊』呢?因為它幾乎『無所不在』也!且聽聽 JULIUS O. SMITH III 先生說法吧︰

SPECTRAL AUDIO SIGNAL PROCESSING

JULIUS O. SMITH III
Center for Computer Research in Music and Acoustics (CCRMA)

……

Why Analyze Noise?

An example application of noise spectral analysis is denoising, in which noise is to be removed from some recording. On magnetic tape, for example, “tape hiss” is well modeled mathematically as a noise process. If we know the noise level in each frequency band (its power level), we can construct time-varying band gains to suppress the noise when it is audible. That is, the gain in each band is close to 1 when the music is louder than the noise, and close to 0 when the noise is louder than the music. Since tape hiss is well modeled as stationary (constant in nature over time), we can estimate the noise level during periods of “silence” on the tape.

Another application of noise spectral analysis is spectral modeling synthesis (the subject of §10.4). In this sound modeling technique, sinusoidal peaks are measured and removed from each frame of a short-time Fourier transform (sequence of FFTs over time). The remaining signal energy, whatever it may be, is defined as “noise” and resynthesized using white noise through a filter determined by the upper spectral envelope of the “noise floor”.

───

 

若是我們知道 □ ○ 『雜訊特徵』,或可改善『信噪比』乎?!

Signal-to-noise ratio

Signal-to-noise ratio (abbreviated SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise.

S/N ratio is defined as the ratio of signal power to the noise power, often expressed in decibels. A ratio higher than 1:1 (greater than 0 dB) indicates more signal than noise.

While SNR is commonly quoted for electrical signals, it can be applied to any form of signal (such as isotope levels in an ice core or biochemical signaling between cells or financial trading signals).

The signal-to-noise ratio, the bandwidth, and the channel capacity之時矣◎ of a communication channel are connected by the Shannon–Hartley theorem.

Signal-to-noise ratio is sometimes used metaphorically to refer to the ratio of useful information to false or irrelevant data in a conversation or exchange. For example, in online discussion forums and other online communities, off-topic posts and spam are regarded as “noise” that interferes with the “signal” of appropriate discussion.[1]

 

如果比類那個『白色雜訊』,豈非語音活性檢測

Voice activity detection

Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected.[1] The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol applications, saving on computation and on network bandwidth.

VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually language independent.

It was first investigated for use on time-assignment speech interpolation (TASI) systems.[2]

 

的『理論閾值』發凡啊!?

然而如何能將『不想要』的訊號,都視之為『雜訊』耶??

Signal-to-interference-plus-noise ratio

In information theory and telecommunication engineering, the signal-to-interference-plus-noise ratio (SINR[1]) (also known as the signal-to-noise-plus-interference ratio (SNIR)[2]) is a quantity used to give theoretical upper bounds on channel capacity (or the rate of information transfer) in wireless communication systems such as networks. Analogous to the SNR used often in wired communications systems, the SINR is defined as the power of a certain signal of interest divided by the sum of the interference power (from all the other interfering signals) and the power of some background noise. If the power of noise term is zero, then the SINR reduces to the signal-to-interference ratio (SIR). Conversely, zero interference reduces the SINR to the signal-to-noise ratio (SNR), which is used less often when developing mathematical models of wireless networks such as cellular networks.[3]

The complexity and randomness of certain types of wireless networks and signal propagation has motivated the use of stochastic geometry models in order to model the SINR, particularly for cellular or mobile phone networks.[4]

Description

SINR is commonly used in wireless communication as a way to measure the quality of wireless connections. Typically, the energy of a signal fades with distance, which is referred to as a path loss in wireless networks. Conversely, in wired networks the existence of a wired path between the sender or transmitter and the receiver determines the correct reception of data. In a wireless network ones has to take other factors into account (e.g. the background noise, interfering strength of other simultaneous transmission). The concept of SINR attempts to create a representation of this aspect.

Mathematical definition

The definition of SINR is usually defined for a particular receiver (or user). In particular, for a receiver located at some point x in space (usually, on the plane), then its corresponding SINR given by

  {\mathrm {SINR}}(x){{=}}{\frac {P}{I+N}}

where P is the power of the incoming signal of interest, I is the interference power of the other (interfering) signals in the network, and N is some noise term, which may be a constant or random. Like other ratios in electronic engineering and related fields, the SINR is often expressed in decibels or dB.

 

當下『語音科技』正致力超越

牙牙學語

牙牙學語 是兒童發展語言的一個階段,在這段語言習得期間,嬰兒好像要嘗試用嘴巴把聲音說出來,但還未能產生任何可辨別的詞彙。嬰兒在出生後不久開始會牙牙學語期,此時需經過幾個階段像是嬰兒的聲音行為戲目會擴大,他們的發聲就會變得越來越像語言 。[1] 嬰兒通常年齡大約在12個月時會開始產出能辨認的詞彙,但在這段時間後,牙牙學語期可能還會繼續一段時間。 [2] 牙牙學語可以被視為語言發展的先驅或是僅作為聲音實驗。身體構造的發展也涉及到牙牙學語之中,它仍需要在小孩一歲時發展。[3] 這種持續性的身體發育負責一些能力上的改變還有讓嬰兒能產出不同的聲音變化 。異常發展,如某些健康狀況、發育遲緩和聽力障礙可能阻礙小孩正常牙牙學語的的能力。 雖然仍有人反對語言是人類獨有的能力,但牙牙學語並非人類物種所僅有。[4]

 

之時呦☆

 

 

 

 

 

 

 

開工?!

為什麼一張圖

Circle_of_confusion_calculation_diagram.svg

一個式子

c = A \frac{| S_2 - S_1 |}{S_2} \frac{f}{S_1 -f}

= \frac{| S_2 - S_1 |}{S_2} \frac{f^2}{N (S_1 -f)}

這裡 A = \frac{f}{N}

會 令人如此困惑耶?假使不知道它說人眼『分辨率』有極限!藉此來定義『模糊』與『清晰』的分野。即使不談『孔徑』,一個透鏡也自有邊界 A 的哩!更由於『成像條件』使得只有一物距 S_1 能完美聚焦成像 f_1 【像距】。就此而論其它 S_2 遠、近之物在像面上將形成『彌散圓』,要是它小到人可將之視為『點』,此時視力不得不以為成像『清晰』的了。雖然那個式子貌似複雜,涉及多個參數,其中 fN 是這個光學系統內稟參數,實際是以『聚焦之物』 S_1 ,論述『相對 』它物 S_2 所產生的『模糊圈』大小而已。在下面兩種情況裡, c 得以簡化︰

【聚焦於無窮遠】 S_1 \to \infty

c = \frac{f^2}{N S_2} ,與 S_1 無關。

【相對無窮遠之物】 S_2 \to \infty

c = \frac{f^2}{N (S_1 - f)} ,與 S_2 無關。

或可先思其蘊涵意義耶!!

─── 《光的世界︰【□○閱讀】話眼睛《九》

 

也許人耳鎖定聽者之能力很好且太過自然!就像情人的 眼裡只有那一個伊?

於是無有其

,(拼音:[yù],注音:ㄩˋ)或閾值,(拼音:[yùzhí],注音:[ㄩˋㄓˊ]),又叫臨界值門檻值。英語中的同義詞是threshold。閾值是令對象發生某種變化所需的某種條件的值,在學術研究中是常用語。閾值根據條件本身可以有不同的單位。閾值被廣泛運用在包括建築學生物學航天化學電信電子心理學等各個領域,並作為詞根派生出大量的相關詞彙。該詞與極值沒有必然聯繫。

 

耶!!因此難解

Speech recognition module for Python, supporting several engines and APIs, online and offline. https://pypi.python.org/pypi/SpeechRe…

 

怎麼會有如是之機制

recognizer_instance.energy_threshold = 300 # type: float

Represents the energy level threshold for sounds. Values below this threshold are considered silence, and values above this threshold are considered speech. Can be changed.

……

If you’re having trouble with the recognizer trying to recognize words even when you’re not speaking, try tweaking this to a higher value. If you’re having trouble with the recognizer not recognizing your words when you are speaking, try tweaking this to a lower value. For example, a sensitive microphone or microphones in louder rooms might have a ambient energy level of up to 4000:

 

乎??

 

 

 

 

 

 

 

 

初四

四季平安,五福臨門

濮陽西水坡蚌殼龍虎圖

古人用蚌殼擺塑出了一幅天文星圖,其年代约為距今6500年

凌家灘玉版

良渚文化玉琮

曾侯乙墓二十八宿漆箱五面圖象

新石器時代仰韶文化中期,一個六千五百年前『濮陽西水坡』的墓穴,裡頭有一幅用『蚌殼』堆出的『龍虎圖』,刻意擺放的骸骨方位,到底在說著些什麼呢?中國的天文考古學家馮時先生認為︰

文本引自鄭杭生胡翼鵬先生所寫的論文《天道左旋,天圆地方:社會運行的溯源和依據

對這組蚌殼龍虎圖案解說最深入的研究者是天文考古學家馮時。馮時認為,解釋這幅蚌塑龍虎圖案的關鍵是墓主人脚下、正北面的那個蚌塑梯形與人體脛骨组成的圖案:這是一個北斗的造型,蚌塑梯形表示斗魁,東側横置的兩根脛骨表示斗杓,所以這是一個構造十分完整的二象北斗天象圖。

蚌塑梯形與脛骨構成的北斗圖象,不儘是從形狀上認證,更主要的是從表示斗杓的兩根人體脛骨去尋找線索。古代計算時間的一種方法,是通過對人體影子長短變化的測量,所以最初的測影工具是模仿人體来設計的,這就是“”。正是因為人體、表與時間具有這種特殊關係,所以古人把計量時間的表叫作“”, 而“髀”的本 義是人體的腿骨,從大量的史料文獻中可以找到證據,古代測量日影的工具“表”就是由人骨轉變而來,所以人骨在作為一個生物體的同時,在古代還曾充當過測定 日影的工具。濮陽西水坡45號墓中的北斗圖,把腿骨、表和時間這三個方面聯繫起来,體現了古人通過立表測影和觀測北斗來測定時間這兩種方法的結合。在這個 蚌殼梯形與脛骨的構圖中,脛骨的意義就是表示測定時間的工具。而北斗星也是古代中國人觀望天象,以此作為决定時間的標準星象。所以以脛骨作為這個構圖的長 柄,結合整個構圖,可以認定蚌殼梯形與脛骨構成的圖案就是北斗星。確定了北斗星,再聯繫整個圖象的布局和造型,那麼這副蚌殼擺塑的龍和虎就只能作為星象來解釋,這樣本來孤立的龍虎圖由于北斗的存在而被自然地聯繫成了整體,成為天上的星宿和星象,即四象中的蒼龍白虎。而那個制式奇特的墓穴,其形狀實際呈現了最原始的蓋天圖式,下半部的方形是大地,上半部的圓形是天穹,實則蕴藏著最原始的“天圓地方”觀念。

這個只有蚌殼作為随葬物品的墓穴中, 竟然隱藏著“天”的秘密,陪葬墓主人的居然是整個天上的星斗。而那個北斗星的斗魁用貝殼,表明斗魁在天、在上;斗柄用人的腿骨,表明斗柄指地、在下。在 天、在上,為、為;在地、在下,為、為。它實際反映著古人頂天立地的幻想,所體現的是蒼天與大地的配合或聯繫,是神、鬼、人的相互交往。 而且 6500 年前的古人對天象有如此精細的認識,說明他們的生活時時刻刻離不開對天象的觀察,不僅僅是觀象授時的實用層面上的應用,而如此虔誠的模擬,更說明他們的思想觀念和行為活動都受著“天”的無形制約。

在《馬太福音 25:29;》一文中,我們談到了『北極星』的不動與『太陽』之視運動,遠古之人就從觀察實踐中得出了『天圓地方』的『理念』,以及『天左旋,地右動』的『道理』。人們因著『觀測』天地事物,而能建立『理論』;追究『概念』的『緣由』以及『理則』之『依據』,所以創發『哲學』。因此在生活學習的道路上,其實是『事無古今,理無中外』,彼此『同異之間』的『匯通處』往往就是『基元』的『觀念』;『基元觀念』的不同『詮釋』成為相異的『學說體系』。事實上『字串改寫系統』、『圖靈機』與『 λ 運算』,說著『□□』的不同『側寫』,彼此之間可以用『○○』來對應『轉譯』,人們或說『』或講『』的各種『詮釋』就祇在其人的了!!

─── 摘自《λ 運算︰概念導引《四》