【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-7.2A 》

如同色彩並非客觀之物理量,響度也不是︰

Loudness

In acoustics, loudness is the subjective perception of sound pressure. More formally, it is defined as, “That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud.”[1] The relation of physical attributes of sound to perceived loudness consists of physical, physiological and psychological components. The study of apparent loudness is included in the topic of psychoacoustics and employs methods of psychophysics.

In different industries, loudness may have different meanings and different measurement standards. Some definitions such as LKFS refer to relative loudness of different segments of electronically reproduced sounds such as for broadcasting and cinema. Others, such as ISO 532A (Stevens loudness, measured in sones), ISO 532B (Zwicker loudness), DIN 45631 and ASA/ANSI S3.4, have a more general scope and are often used to characterize loudness of environmental noise.

It is sometimes stated that loudness is a subjective measure, often confused with physical measures of sound strength such as sound pressure, sound pressure level (in decibels), sound intensity or sound power. It is often possible to separate the truly subjective components such as social considerations from the physical and physiological.

Filters such as A-weighting attempt to adjust sound measurements to correspond to loudness as perceived by the typical human, however this approach is only truly valid for loudness of single tones. A-weighting follows human sensitivity to sound and describes relative perceived loudness for at quiet to moderate speech levels, around 40 phons. However, physiological loudness perception is a much more complex process than can be captured with a single correction curve.[2] Not only do equal-loudness contours vary with intensity, but perceived loudness of a complex sound depends on whether its spectral components are closely or widely spaced in frequency. When generating neural impulses in response to sounds of one frequency, the ear is less sensitive to nearby frequencies, which are said to be in the same critical band. Sounds containing spectral components in many critical bands are perceived as louder even if the total sound pressure remains constant.

Explanation

The perception of loudness is related to sound pressure level (SPL), frequency content and duration of a sound. The human auditory system averages the effects of SPL over a 600–1000 ms interval. A sound of constant SPL will be perceived to increase in loudness as samples of duration 20, 50, 100, 200 ms are heard, up to a duration of about 1 second at which point the perception of loudness will stabilize. For sounds of duration greater than 1 second, the moment-by-moment perception of loudness will be related to the average loudness during the preceding 600–1000 ms.[citation needed]

For sounds having a duration longer than 1 second, the relationship between SPL and loudness of a single tone can be approximated by Stevens’ power law in which SPL has an exponent of 0.6.[a] More precise measurements indicate that loudness increases with a higher exponent at low and high levels and with a lower exponent at moderate levels.[citation needed]

 The horizontal axis shows frequency in Hz

The sensitivity of the human ear changes as a function of frequency, as shown in the equal-loudness graph. Each line on this graph shows the SPL required for frequencies to be perceived as equally loud, and different curves pertain to different sound pressure levels. It also shows that humans with normal hearing are most sensitive to sounds around 2–4 kHz, with sensitivity declining to either side of this region. A complete model of the perception of loudness will include the integration of SPL by frequency.[2]

Historically, loudness was measured using an “ear-balance” audiometer in which the amplitude of a sine wave was adjusted by the user to equal the perceived loudness of the sound being evaluated. Contemporary standards for measurement of loudness are based on summation of energy in critical bands as described in IEC 532, DIN 45631 and ASA/ANSI S3.4. A distinction is made between stationary loudness (sounds that remain sensibly constant) and non-stationary (sound sources that move in space or change amplitude over time.)

─── 《【鼎革‧革鼎】︰ Raspbian Stretch 《六之 I 》

 

為求進一步了解 Speech Recognition 程式庫之『energy_threshold』的操作意涵,就得要深入認識『響度』,且知道若干 □ ○ 『權重』!

A-weighting

A-weighting is the most commonly used of a family of curves defined in the International standard IEC 61672:2003 and various national standards relating to the measurement of sound pressure level. A-weighting is applied to instrument-measured sound levels in an effort to account for the relative loudness perceived by the human ear, as the ear is less sensitive to low audio frequencies. It is employed by arithmetically adding a table of values, listed by octave or third-octave bands, to the measured sound pressure levels in dB. The resulting octave band measurements are usually added (logarithmic method) to provide a single A-weighted value describing the sound; the units are written as dB(A). Other weighting sets of values – B, C, D and now Z – are discussed below.

The curves were originally defined for use at different average sound levels, but A-weighting, though originally intended only for the measurement of low-level sounds (around 40 phon), is now commonly used for the measurement of environmental noise and industrial noise, as well as when assessing potential hearing damage and other noise health effects at all sound levels; indeed, the use of A-frequency-weighting is now mandated for all these measurements, although it is badly suited for these purposes, being only applicable to low levels so that it tends to devalue the effects of low frequency noise in particular.[1] It is also used when measuring low-level noise in audio equipment, especially in the United States.[not verified in body] In Britain, Europe and many other parts of the world, broadcasters and audio engineers[who?] more often use the ITU-R 468 noise weighting, which was developed in the 1960s based on research by the BBC and other organizations. This research showed that our ears respond differently to random noise, and the equal-loudness curves on which the A, B and C weightings were based are really only valid for pure single tones.[not verified in body]

A graph of the A-, B-, C- and D-weightings across the frequency range 10 Hz – 20 kHz

 

故而想用 MIR 之環境,眼見耳聽一番◎

librosa.core.A_weighting(frequencies, min_db=-80.0)

Compute the A-weighting of a set of frequencies.

Parameters:

frequencies : scalar or np.ndarray [shape=(n,)]

One or more frequencies (in Hz)

min_db : float [scalar] or None

Clip weights below this threshold. If None, no clipping is performed.

Returns:

A_weighting : scalar or np.ndarray [shape=(n,)]

A_weighting[i] is the A-weighting of frequencies[i]

 

因此選讀

#!/usr/bin/env python3

# NOTE: this example requires PyAudio because it uses the Microphone class

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# write audio to a RAW file
with open("microphone-results.raw", "wb") as f:
    f.write(audio.get_raw_data())

# write audio to a WAV file
with open("microphone-results.wav", "wb") as f:
    f.write(audio.get_wav_data())

# write audio to an AIFF file
with open("microphone-results.aiff", "wb") as f:
    f.write(audio.get_aiff_data())

# write audio to a FLAC file
with open("microphone-results.flac", "wb") as f:
    f.write(audio.get_flac_data())

※ 註

AudioFile(filename_or_fileobject: Union[str, io.IOBase]) -> AudioFile

Creates a new AudioFile instance given a WAV/AIFF/FLAC audio file filename_or_fileobject. Subclass of AudioSource.

If filename_or_fileobject is a string, then it is interpreted as a path to an audio file on the filesystem. Otherwise, filename_or_fileobject should be a file-like object such as io.BytesIO or similar.

Note that functions that read from the audio (such as recognizer_instance.record or recognizer_instance.listen) will move ahead in the stream. For example, if you execute recognizer_instance.record(audiofile_instance, duration=10) twice, the first time it will return the first 10 seconds of audio, and the second time it will return the 10 seconds of audio right after that. This is always reset when entering the context with a context manager.

WAV files must be in PCM/LPCM format; WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported and may result in undefined behaviour.

Both AIFF and AIFF-C (compressed AIFF) formats are supported.

FLAC files must be in native FLAC format; OGG-FLAC is not supported and may result in undefined behaviour.

Instances of this class are context managers, and are designed to be used with with statements:

import speech_recognition as sr
with sr.AudioFile("SOME_AUDIO_FILE") as source:    # open the audio file for reading
    pass                                           # do things here - ``source`` is the AudioFile instance created above

 

願先通『檔案介面』也。