【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.1下 》

浮點聲音 float audio 知多少?SciPy 函數道分明︰

scipy.io.wavfile.read

scipy.io.wavfile.read(filename, mmap=False)
Open a WAV file

Return the sample rate (in samples/sec) and data from a WAV file.

Parameters:

filename : string or open file handle

Input wav file.

mmap : bool, optional

Whether to read data as memory-mapped. Only to be used on real files (Default: False).

New in version 0.12.0.

Returns:

rate : int

Sample rate of wav file.

data : numpy array

Data read from wav file. Data-type is determined from the file; see Notes.

Notes

This function cannot read wav files with 24-bit data.

Common data types: [R120]

WAV format Min Max NumPy dtype
32-bit floating-point -1.0 +1.0 float32
32-bit PCM -2147483648 +2147483647 int32
16-bit PCM -32768 +32767 int16
8-bit PCM 0 255 uint8

Note that 8-bit PCM is unsigned.

References

[R120] (1, 2) IBM Corporation and Microsoft Corporation, “Multimedia Programming Interface and Data Specifications 1.0”, section “Data Format of the Samples”, August 1991 http://www.tactilemedia.com/info/MCI_Control_Info.html

※ 參考

Single-precision floating-point format

 Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.In IEEE 754-2008 the 32-bit base-2 format is officially referred to as binary32. It was called single in IEEE 754-1985. In older computers, different floating-point formats of 4 bytes were used, e.g., GW-BASIC‘s single-precision data type was the 32-bit MBF floating-point format.

One of the first programming languages to provide single- and double-precision floating-point data types was Fortran. Before the widespread adoption of IEEE 754-1985, the representation and properties of the double float data type depended on the computer manufacturer and computer model.

Single-precision binary floating-point is used due to its wider range over fixed point (of the same bit-width), even if at the cost of precision. A signed 32-bit integer can have a maximum value of 231 − 1 = 2,147,483,647, whereas the maximum representable IEEE 754 floating-point value is (2 − 2−23) × 2127 ≈ 3.402823 × 1038. All integers with 6 or fewer significant decimal digits can be converted to an IEEE 754 floating-point value without loss of precision, and any number that can be written as 2n such that n is a whole number from -126 to 127 can be converted to an IEEE 754 floating-point number without a loss of precision.

Single precision is termed REAL in Fortran,[1] SINGLE-FLOAT in Common Lisp,[2] float in C, C++, C#, Java,[3] Float in Haskell,[4] and Single in Object Pascal (Delphi), Visual Basic, and MATLAB. However, float in Python, Ruby, PHP, and OCaml and single in versions of Octave before 3.2 refer to double-precision numbers. In most implementations of PostScript, the only real precision is single.

WAV

Waveform Audio File Format (WAVE, or more commonly known as WAV due to its filename extension)[3][6][7][8] (rarely, Audio for Windows)[9] is a Microsoft and IBM audio file format standard for storing an audio bitstream on PCs. It is an application of the Resource Interchange File Format (RIFF) bitstream format method for storing data in “chunks”, and thus is also close to the 8SVX and the AIFF format used on Amiga and Macintosh computers, respectively. It is the main format used on Windows systems for raw and typically uncompressed audio. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.

──

 

看似簡易實深奧,取樣、量化熔一爐︰

Sampling (signal processing)

Signal sampling representation. The continuous signal is represented with a green colored line while the discrete samples are indicated by the blue vertical lines.

In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave (a continuous signal) to a sequence of samples (a discrete-time signal).

A sample is a value or set of values at a point in time and/or space.

A sampler is a subsystem or operation that extracts samples from a continuous signal.

A theoretical ideal sampler produces samples equivalent to the instantaneous value of the continuous signal at the desired points.
 …

Quantization (signal processing)

 

The simplest way to quantize a signal is to choose the digital amplitude value closest to the original analog amplitude. This example shows the original analog signal (green), the quantized signal (black dots), the signal reconstructed from the quantized signal (yellow) and the difference between the original signal and the reconstructed signal (red). The difference between the original signal and the reconstructed signal is the quantization error and, in this simple quantization scheme, is a deterministic function of the input signal.

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

The difference between an input value and its quantized value (such as round-off error) is referred to as quantization error. A device or algorithmic function that performs quantization is called a quantizer. An analog-to-digital converter is an example of a quantizer.

──

 

然而並非傑克 jackd 一家言,大膽者也早已用◎

Welcome to Audacity

Audacity® is free, open source, cross-platform audio software for multi-track recording and editing.

Audacity is available for Windows®, Mac®, GNU/Linux® and other operating systems. Check our feature list, Wiki and Forum.

Bit Depth

Contents

  1. Dynamic Range
  2. Audacity Defaults
  3. Effects on file size and CPU use
  4. Which bit depth to use
  5. Bit depth of various sources
 

Dynamic Range

Bit depth is the number of bits used to carry the data in each sample of audio. The bit depth chosen for recording limits the dynamic range of the recording. (Other factors in the audio chain may also limit this, so more bits often will not produce a better recording.)

 

Audacity Defaults

The Audacity default quality settings are Sample Format 32-bit float(and Sample Rate 44100 Hz). It is strongly recommended that you use these settings unless you have good reasons to deviate from these. 32-bit float is chosen to give an extremely low noise floor and to provide good headroom to avoid sound distortion even when performing heavy editing and manipulation of the audio.

Audacity uses “float” format for 32-bit recording instead of fixed integer format as normalized floating point values are quicker and easier to process on computers than fixed integer values and allow greater dynamic range to be retained even after editing. This is because intermediate signals during audio processing can have very variable values. If they all get truncated to a fixed integer format, you can’t boost them back up to full scale without losing resolution (i.e. without the data becoming less representative of the original than it was before). With floating point, rounding errors during intermediate processing are negligible.

The (theoretically audible) advantage of this is that 32-bit floating point format retains the original noise floor, and does not add noise. For example, with fixed integer data, applying a compressor effect to lower the peaks by 9 dB and separately amplifying back up would cost 9 dB (or more than 2 bits) of signal to noise ratio (SNR). If done with floating point data, the SNR of the peaks remains as good as before (except that the quiet passages are 9 dB louder and so 9 dB noisier due to the noise they had in the first place).

In many cases you will be exporting to a 16-bit format (for example if you are burning to a standard audio CD, that format is by definition 16-bit 44100 Hz). The advantage of using 32-bit float to work with holds even if you have to export to a 16-bit format. Using Dither on the Quality tab of Audacity Preferences will improve the sound quality of the exported file so there are only minimal (probably non-audible) effects of downsampling from 32-bit to 16-bit.

……

Digital Audio Fundamentals

Digital Audio Quality

The quality of a digital audio recording depends heavily on two factors: the sample rate and the sample format or bit depth. Increasing the sample rate or the number of bits in each sample increases the quality of the recording, but also increases the amount of space used by audio files on a computer or disk.

 

Sample rates

Sample rates are measured in hertz (Hz), or cycles per second. This value is the number of samples captured per second in order to represent the waveform. Higher sample rates allow higher audio frequencies to be represented. Provided that the sample rate is more than double the highest audio frequency present, the waveform can be reconstructed exactly from the digital samples. Frequencies that are more than half the sample rate cannot be correctly represented in digital samples, and, if present in the original audio, must be removed before converting to digital. “Half the sample rate” therefore represents an upper limit called the Nyquist frequency, and the analog waveform must be entirely below this limit to be correctly represented digitally. Analog frequencies at this limit or above cannot be correctly represented by the digital samples and would cause a kind of distortion called aliasing.

The human ear is sensitive to sound patterns with frequencies between approximately 20 Hz and 20000 Hz. Sounds outside that range are inaudible. Therefore a sample rate of 40000 Hz is the absolute minimum necessary to reproduce sounds within the range of human hearing. Higher rates (called oversampling) are usually used so as to allow adequate filtering to avoid aliasing artifacts around the Nyquist frequency.

The sample rate used by audio CDs is 44100 Hz. Human speech is intelligible even if frequencies above 4000 Hz are eliminated; in fact telephones only transmit frequencies between 200 Hz and 4000 Hz. Therefore a common sample rate for audio recordings is 8000 Hz, which is sometimes called speech quality. Note that very steep filtering (called an anti-aliasing filter) is required at the Nyquist frequency in order to prohibit signal above this cutoff point from being folded back into the audible range by the digital converter, and creating the distorting artifacts of aliasing noise.

The most common sample rates measured in Hz are 8000, 16000, 22050, 44100, 48000, 96000 and 192000. Sample rates can also be referred to in kHz or units of 1000 Hz. So in units of kHz the most common rates are expressed as 8 kHz, 16 kHz, 22.05 kHz, 44.1 kHz, 48 kHz, 96 kHz and 192 kHz.

Audacity supports any of these sample rates, however most computer soundcards are limited to no more than 48000 Hz, 96000 Hz or sometimes 192000Hz. Again, the most common sample rate by far is 44100 Hz and many cards will thus default to this rate, whatever other rates they support.

In the image below, the left half has a low sample rate, and the right half has a high sample rate (ie. high resolution):

Sample formats

The other measure of audio quality is the sample format (or bit depth), which is usually measured by the number of computer bits used to represent each sample. The more bits that are used, the more precise the representation of each sample. Increasing the number of bits also increases the maximum dynamic range of the audio recording, in other words the difference in volume between the loudest and softest possible sounds that can be represented.

Dynamic range is measured in decibels (dB). The human ear can perceive sounds with a dynamic range of at least 90 dB. However, whenever possible it is a good idea to record digital audio with a dynamic range of far more than 90 dB, in part so that sounds that are too soft can be amplified for maximum fidelity. Note that although signals recorded at generally low levels can be raised (that is, normalized) to take advantage of the available dynamic range, the recording of low level signals will not use all of the available bit depth. This loss of resolution cannot be re-captured simply by normalizing the overall level of the digital waveform.

Common sample formats, and their respective dynamic range include:

  • 8-bit integer: 48 dB
  • 16-bit integer: 96 dB
  • 24-bit integer: 145 dB
  • 32-bit floating point: near-infinite dB

Note that there are practical limitations on dynamic range due to the capabilities of the hardware and input and output converters. These make the practical limit more like 90 dB for 16-bit.

Other sample formats such as ADPCM approximate 16-bit audio with compressed 4-bit samples. Audacity can import many of these formats, but they are rarely used because of much better newer compression methods.

Audio CDs and most computer audio file formats use 16-bit integers. Audacity uses 32-bit floating-point samples internally and, if required, converts the sample bit depth when the final mix is exported. Audacity’s default sample format during recording can be configured in the Quality Preferences or set individually for each track in the Audio Track Dropdown Menu. During playback, the audio in any tracks that have a different sample format from the project will be resampled on the fly using the Real-time Conversion settings in the Quality Preferences. The High-quality Conversion settings are used when processing, mixing or exporting.

In the image below, the left half has a sample format with few bits, and the right half has a sample format with more bits. If you think of the sample rate as the spacing between vertical gridlines, the sample format is the spacing between horizontal gridlines.

Waveform sample formats.png

───