【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-7.2F 》

登高望遠回首來時路,或許方向看的更清楚吧!?

 

說此『樹莓派挾泰山以超北海』之事︰

Project DeepSpeech

Project DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques, based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow project to make the implementation easier.

 

Pre-built binaries that can be used for performing inference with a trained model can be installed with pip. Proper setup using virtual environment is recommended and you can find that documented below.

A pre-trained English model is available for use, and can be downloaded using the instructions below.

Once everything is installed you can then use the deepspeech binary to do speech-to-text on short, approximately 5 second, audio files (currently only WAVE files with 16-bit, 16 kHz, mono are supported in the Python client):

pip install deepspeech
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

Alternatively, quicker inference (The realtime factor on a GeForce GTX 1070 is about 0.44.) can be performed using a supported NVIDIA GPU on Linux. (See the release notes to find which GPU’s are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu
deepspeech models/output_graph.pb models/alphabet.txt my_audio_file.wav

See the output of deepspeech -h for more information on the use of deepspeech. (If you experience problems running deepspeech, please check required runtime dependencies).

 

冀免眾裡尋他千百度也?!