Notice: Trying to access array offset on value of type bool in /home1/freesand/public_html/wp-content/plugins/wiki-embed/WikiEmbed.php on line 112

Notice: Trying to access array offset on value of type bool in /home1/freesand/public_html/wp-content/plugins/wiki-embed/WikiEmbed.php on line 112

Notice: Trying to access array offset on value of type bool in /home1/freesand/public_html/wp-content/plugins/wiki-embed/WikiEmbed.php on line 116
30 | 1 月 | 2018 | FreeSandal

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-5左 》

借徑維基百科的詞條

語音識別

語音識別(speech recognition;語音辨識言語辨別)技術,也被稱為自動語音識別英語:Automatic Speech Recognition, ASR)、電腦語音識別英語:Computer Speech Recognition)或是語音轉文本識別(英語:Speech To Text, STT,其目標是以電腦自動將人類的語音內容轉換為相應的文字。與說話人識別說話人確認不同,後者嘗試識別或確認發出語音的說話人而非其中所包含的詞彙內容。

語音識別技術的應用包括語音撥號、語音導航、室內設備控制、語音文檔檢索、簡單的聽寫數據錄入等。語音識別技術與其他自然語言處理技術如機器翻譯語音合成技術相結合,可以構建出更加複雜的應用,例如語音到語音的翻譯。[1]

語音識別技術所涉及的領域包括:信號處理模式識別機率論資訊理論、發聲機理和聽覺機理、人工智慧等等。

歷史

早在計算機發明之前,自動語音識別的設想就已經被提上了議事日程,早期的聲碼器可被視作語音識別及合成的雛形。而1920年代生產的”Radio Rex”玩具狗可能是最早的語音識別器,當這隻狗的名字被呼喚的時候,它能夠從底座上彈出來[2]。最早的基於電子計算機的語音識別系統是由AT&T貝爾實驗室開發的Audrey語音識別系統,它能夠識別10個英文數字。其識別方法是跟蹤語音中的共振峰。該系統得到了98%的正確率。[3]。到1950年代末,倫敦學院(Colledge of London)的Denes已經將語法機率加入語音識別中。

1960年代,人工神經網絡被引入了語音識別。這一時代的兩大突破是線性預測編碼Linear Predictive Coding (LPC), 及動態時間規整Dynamic Time Warp技術。

語音識別技術的最重大突破是隱含馬爾科夫模型Hidden Markov Model的應用。從Baum提出相關數學推理,經過Rabiner等人的研究,卡內基梅隆大學李開復最終實現了第一個基於隱馬爾科夫模型的大詞彙量語音識別系統Sphinx[4]。此後嚴格來說語音識別技術並沒有脫離HMM框架。

儘管多年來研究人員一直嘗試將「聽寫機」推廣,語音識別技術在目前還無法支持無限領域,無限說話人的聽寫機應用。

 

通往卡內基梅隆大學 CMUSphinx 之發展歷史 ︰

About CMUSphinx

CMUSphinx collects over 20 years of the CMU research. All advantages are hard to list, but just to name a few:

  • State of art speech recognition algorithms for efficient speech recognition. CMUSphinx tools are designed specifically for low-resource platforms
  • Flexible design
  • Focus on practical application development and not on research
  • Support for several languages like US English, UK English, French, Mandarin, German, Dutch, Russian and ability to build a models for others
  • BSD-like license which allows commercial distribution
  • Commercial support
  • Active development and release schedule
  • Active community (more than 400 users on Linkedin CMUSphinx group)
  • Wide range of tools for many speech-recognition related purposes (keyword spotting, alignment, pronuncation evaluation)

 

卻煩惱與其從頭開始講

CMU Sphinx Downloads

Software

CMU Sphinx toolkit has a number of packages for different tasks and applications. It’s sometimes confusing what to choose. To cleanup, here is the list

  • Pocketsphinx — recognizer library written in C.
  • Sphinxtrain — acoustic model training tools
  • Sphinxbase — support library required by Pocketsphinx and Sphinxtrain
  • Sphinx4 — adjustable, modifiable recognizer written in Java

We recommend you to use the latest available releases:

If you want to try bleeding edge version, pull the latest code from Github. Then compile packages from the source code, but remember that there is no guarantee they will be stable.

http://github.com/cmusphinx

Older releases and files could be found on SourceForge http://sourceforge.net/projects/cmusphinx/files/

We do not maintain distribution-specific packages yet, but help to update them is truely appreciated. Some distributions already include CMUSphinx packages:

Models

CMUSphinx assumes that you use the statistical models which describe language. There are many models trained for various acoustic conditions and various performance requirements. We collect the best models available at our download page. We hope you’ll be able to find the best model for your language there:

Download models

……

 

不如假託故事

Judy – Simplified Voice Control on Raspberry Pi

Judy is a simplified sister of Jasper, with a focus on education. It is designed to run on:

Raspberry Pi 3
Raspbian Jessie
Python 2.7

Unlike Jasper, Judy does not try to be cross-platform, does not allow you to pick your favorite Speech-to-Text engine or Text-to-Speech engine, does not come with an API for pluggable modules. Judy tries to keep things simple, lets you experience the joy of voice control with as little hassle as possible.

A Speech-to-Text engine is a piece of software that interprets human voice into a string of text. It lets the computer know what is being said. Conversely, a Text-to-Speech engine converts text into sound. It allows the computer to speak, probably as a response to your command.

Judy uses:

Additionally, you need:

  • a Speaker to plug into Raspberry Pi’s headphone jack
  • a USB Microphone

Plug them in. Let’s go.

 

談點語音控制呦◎

 

 

 

 

 

 

 

輕。鬆。學。部落客