【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.1 》

眼見 Python Pocketsphinx 之預設組構︰

Default config

If you don’t pass any argument while creating an instance of the Pocketsphinx, AudioFile or LiveSpeech class, it will use next default values:

verbose = False
logfn = /dev/null or nul
audio_file = site-packages/pocketsphinx/data/goforward.raw
audio_device = None
sampling_rate = 16000
buffer_size = 2048
no_search = False
full_utt = False
hmm = site-packages/pocketsphinx/model/en-us
lm = site-packages/pocketsphinx/model/en-us.lm.bin
dict = site-packages/pocketsphinx/model/cmudict-en-us.dict

Any other option must be passed into the config as is, without using symbol -.

If you want to disable default language model or dictionary, you can change the value of the corresponding options to False:

lm = False
dict = False

……

 

道明其語音檔格式原始乎?

Raw audio format

RAW Audio format or just RAW Audio is an audio file format for storing uncompressed audio in raw form. Comparable to WAV or AIFF in size, RAW Audio file does not include any header information (sampling rate, bit depth, endian, or number of channels). Data can be written in PCM, IEEE 754 or ASCII.

Extensions

Raw files can have a wide range of file extensions, common ones being .raw, .pcm, or .sam. They can also have no extension.

Playing

As there is no header, compatible audio players require information from the user that would normally be stored in a header, such as the encoding, sample rate, number of bits used per sample, and the number of channels.

………

 

該怎樣以『檔案為介面』界接 librosa 耶?!

一時想起☆

Advanced I/O Use Cases

This section covers advanced use cases for input and output which go beyond the I/O functionality currently provided by librosa.

Read specific formats

librosa uses audioread for reading audio. While we chose this library for best flexibility and support of various compressed formats like MP3: some specific formats might not be supported. Especially specific WAV subformats like 24bit PCM or 32bit float might cause problems depending on your installed audioread codecs. libsndfile covers a bunch of these formats. There is a neat wrapper for libsndfile called PySoundFile which makes it easy to use the library from python.

Note

See installation instruction for PySoundFile here.

Reading audio files using PySoundFile is similmar to the method in librosa. One important difference is that the read data is of shape (nb_samples, nb_channels) compared to (nb_channels, nb_samples) in <librosa.core.load>. Also the signal is not resampled to 22050 Hz by default, hence it would need be transposed and resampled for further processing in librosa. The following example is equivalent to librosa.load(librosa.util.example_audio_file()):

import librosa
import soundfile as sf

# Get example audio file
filename = librosa.util.example_audio_file()

data, samplerate = sf.read(filename, dtype='float32')
data = data.T
data_22k = librosa.resample(data, samplerate, 22050)

───

 

何不直搗黃龍

PySoundFile

PySoundFile is an audio library based on libsndfile, CFFI and NumPy. Full documentation is available on http://pysoundfile.readthedocs.org/.

PySoundFile can read and write sound files. File reading/writing is supported through libsndfile, which is a free, cross-platform, open-source (LGPL) library for reading and writing many different sampled sound file formats that runs on many platforms including Windows, OS X, and Unix. It is accessed through CFFI, which is a foreign function interface for Python calling C code. CFFI is supported for CPython 2.6+, 3.x and PyPy 2.0+. PySoundFile represents audio data as NumPy arrays.

PySoundFile is BSD licensed (BSD 3-Clause License).
(c) 2013, Bastian Bechtold

───

 

一探究竟呢◎

soundfile.available_formats

soundfile.available_formats()

Return a dictionary of available major formats.

Examples

In [1]: import soundfile as sf

In [2]: sf.available_formats()
Out[2]: 
{'AIFF': 'AIFF (Apple/SGI)',
 'AU': 'AU (Sun/NeXT)',
 'AVR': 'AVR (Audio Visual Research)',
 'CAF': 'CAF (Apple Core Audio File)',
 'FLAC': 'FLAC (Free Lossless Audio Codec)',
 'HTK': 'HTK (HMM Tool Kit)',
 'IRCAM': 'SF (Berkeley/IRCAM/CARL)',
 'MAT4': 'MAT4 (GNU Octave 2.0 / Matlab 4.2)',
 'MAT5': 'MAT5 (GNU Octave 2.1 / Matlab 5.0)',
 'MPC2K': 'MPC (Akai MPC 2k)',
 'NIST': 'WAV (NIST Sphere)',
 'OGG': 'OGG (OGG Container format)',
 'PAF': 'PAF (Ensoniq PARIS)',
 'PVF': 'PVF (Portable Voice Format)',
 'RAW': 'RAW (header-less)',
 'RF64': 'RF64 (RIFF 64)',
 'SD2': 'SD2 (Sound Designer II)',
 'SDS': 'SDS (Midi Sample Dump Standard)',
 'SVX': 'IFF (Amiga IFF/SVX8/SV16)',
 'VOC': 'VOC (Creative Labs)',
 'W64': 'W64 (SoundFoundry WAVE 64)',
 'WAV': 'WAV (Microsoft)',
 'WAVEX': 'WAVEX (Microsoft)',
 'WVE': 'WVE (Psion Series 3)',
 'XI': 'XI (FastTracker 2)'}

 

soundfile.available_subtypes(format=None)

Return a dictionary of available subtypes.

Parameters: format (str) – If given, only compatible subtypes are returned.

Examples

In [3]: sf.available_subtypes('RAW')
Out[3]: 
{'ALAW': 'A-Law',
 'DOUBLE': '64 bit float',
 'DWVW_12': '12 bit DWVW',
 'DWVW_16': '16 bit DWVW',
 'DWVW_24': '24 bit DWVW',
 'FLOAT': '32 bit float',
 'GSM610': 'GSM 6.10',
 'PCM_16': 'Signed 16 bit PCM',
 'PCM_24': 'Signed 24 bit PCM',
 'PCM_32': 'Signed 32 bit PCM',
 'PCM_S8': 'Signed 8 bit PCM',
 'PCM_U8': 'Unsigned 8 bit PCM',
 'ULAW': 'U-Law',
 'VOX_ADPCM': 'VOX ADPCM'}

 

soundfile.check_format(format, subtype=None, endian=None)

Check if the combination of format/subtype/endian is valid.

Examples

In [4]: sf.check_format('RAW', 'PCM_16')
Out[4]: True

 

soundfile.default_subtype(format)

Return the default subtype for a given format.

Examples

In [5]: sf.default_subtype('RAW')

In [6]:

 

雖不免波折★

soundfile.info(file, verbose=False)

Returns an object with information about a SoundFile.

Parameters: verbose (bool) – Whether to print additional information.

soundfile.available_formats()

Return a dictionary of available major formats.

Examples

In [6]: sf.info('./test/goforward.RAW')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9b0cdef67ec7> in <module>()
----> 1 sf.info('./test/goforward.RAW')

/usr/local/lib/python3.5/dist-packages/soundfile.py in info(file, verbose)
    550         Whether to print additional information.
    551     """
--> 552     return _SoundFileInfo(file, verbose)
    553 
    554 

/usr/local/lib/python3.5/dist-packages/soundfile.py in __init__(self, file, verbose)
    497     def __init__(self, file, verbose):
    498         self.verbose = verbose
--> 499         with SoundFile(file) as f:
    500             self.name = f.name
    501             self.samplerate = f.samplerate

/usr/local/lib/python3.5/dist-packages/soundfile.py in __init__(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
    737         self._mode = mode
    738         self._info = _create_info_struct(file, mode, samplerate, channels,
--> 739                                          format, subtype, endian)
    740         self._file = self._open(file, mode_int, closefd)
    741         if set(mode).issuperset('r+') and self.seekable():

/usr/local/lib/python3.5/dist-packages/soundfile.py in _create_info_struct(file, mode, samplerate, channels, format, subtype, endian)
   1520     if 'r' not in mode or format.upper() == 'RAW':
   1521         if samplerate is None:
-> 1522             raise TypeError("samplerate must be specified")
   1523         info.samplerate = samplerate
   1524         if channels is None:

TypeError: samplerate must be specified

 

恰逢即將『狗來富』之時

RAW Files

Pysoundfile can usually auto-detect the file type of sound files. This is not possible for RAW files, though:

Pysoundfile can usually auto-detect the file type of sound files. This is not possible for RAW files, though:

In [7]: data, samplerate = sf.read('./test/goforward.RAW', channels=1, samplerate=16000, subtype='PCM_16')

Note that on x86, this defaults to endian='LITTLE'. If you are reading big endian data (mostly old PowerPC/6800-based files), you have to set endian='BIG' accordingly.

You can write RAW files in a similar way, but be advised that in most cases, a more expressive format is better and should be used instead.

 

 

 

 

 

 

 

 

【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.0 》

欲深入了解『音轉文』 STT Speech To Text ,故動念打造 jupyter 筆記環境,想接軌 MIR 技術也!

 

因此須先確認

Pocketsphinx Python

Latest Version Development Status Supported Python Versions Build Status License

Pocketsphinx is a part of the CMU Sphinx Open Source Toolkit For Speech Recognition.

This package provides a python interface to CMU Sphinxbase and Pocketsphinx libraries created with SWIG and Setuptools.

Supported platforms

  • Windows
  • Linux
  • Mac OS X

Installation

# Make sure we have up-to-date versions of pip, setuptools and wheel:
pip install --upgrade pip setuptools wheel pip install --upgrade pocketsphinx

 

的可行性?

嘗試解決或好︰

pi@raspberrypi:~ $ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04) 
[GCC 6.3.0 20170124] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pocketsphinx import AudioFile
>>> audio_file = '/usr/local/lib/python3.5/dist-packages/pocketsphinx/data/goforward.raw'
>>> for phrase in AudioFile(audio_file=audio_file): print(phrase)
... 
go forward ten meters
>>> 

 

或壞狀況︰

>>> from pocketsphinx import LiveSpeech
>>> for phrase in LiveSpeech(): print(phrase)
... 
Error opening audio device (null) for capture: Connection refused
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pocketsphinx/__init__.py", line 206, in __init__
    self.ad = Ad(self.audio_device, self.sampling_rate)
  File "/usr/local/lib/python3.5/dist-packages/sphinxbase/ad.py", line 124, in __init__
    this = _ad.new_Ad(audio_device, sampling_rate)
RuntimeError: new_Ad returned -1
>>> 

 

至少明白錯誤訊息︰

>>> for phrase in LiveSpeech(audio_device='sysdefault'): print(phrase)
... 
Error opening audio device sysdefault for capture: Connection refused
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pocketsphinx/__init__.py", line 206, in __init__
    self.ad = Ad(self.audio_device, self.sampling_rate)
  File "/usr/local/lib/python3.5/dist-packages/sphinxbase/ad.py", line 124, in __init__
    this = _ad.new_Ad(audio_device, sampling_rate)
RuntimeError: new_Ad returned -1
>>>

 

的原由︰

Q: Failed to open audio device(/dev/dsp): No such file or directory

Device file /dev/dsp is missing because OSS support is not enabled in the kernel. You can either compile pocketsphinx with ALSA support by installing alsa development headers from a package libasound2 or alsa-devel and recompiling or you can install oss-compat package to enable OSS support.

The installation process is not an issue if you understand the complexity of audio subsystems in Linux. The audio subsystem is complex unfortunately, but once you get it things will be easier. Historically, audio subsystem is pretty fragmented. It includes the following major frameworks:

  • Old Unix-like DSP framework – everything is handled by the kernel-space driver. Applications interact with /dev/dsp device to produce and record audio
  • ALSA – newer audio subsystem, partially in kernel but also has userspace library libasound. ALSA also provides DSP compatibliity layer through snd_pcm_oss driver which creates /dev/dsp device and emulates audio
  • Pulseaudio – even newer system which works on the top of libasound ALSA library but provides a sound server to centralize all the processing. To communicate with the library it also provides libpulse library which must be used by applications to record sound
  • Jack – another sound server, also works on the top of ALSA, provides anoher library libjack. Similar to Pulseaudio there are others not very popular frameworks, but sphinxbase doesn’t support them. Example are ESD (old GNOME sound server), ARTS (old KDE sound server), Portaudio (portable library usable across Windows, Linux and Mac).

The recommended audio framework on Ubuntu is pulseaudio.

Sphinxbase and pocketsphinx support all the frameworks and automatically selects the one you need in compile time. The highest priority is in pulseaudio framework. Before you install sphinxbase you need to decide which framework to use. You need to setup the development part of the corresponding framework after that.

For example, it’s recommended to install libpulse-dev package to provide access to pulseaudio and after that sphinxbase will automatically work with Pulseaudio. Once you work with pulseaudio you do not need other frameworks. On embedded device try to configure alsa.

 

說此跌跌撞撞過程,博君一笑吧☆