【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.3-言語界面-6.2 》

『音轉文』STT 和『轉寫』相去多遠呢?

扒帶英語:transcription,意為「轉寫」),亦稱扒譜,是指通過人對一首樂曲的反覆試聽,從而將其總譜寫下來的過程。扒帶的主要目的是恢復原譜以便學習樂曲並進行演奏。另外,扒帶也有助於根據樂曲重新製作MIDI音樂。例如目前許多手機鈴聲就是通過扒帶的方法製作出來的。

人耳對音高的敏感度及對樂器的辨別力直接決定了扒帶質量的好壞。另一方面,扒帶對人耳也是一個非常好的訓練。

──

Transcription (music)

In music, transcription can mean notating a piece or a sound which was previously unnotated, as, for example, an improvised jazz solo. When a musician is tasked with creating sheet music from a recording and they write down the notes that make up the song in music notation, it is said that they created a musical transcription of that recording. Transcription may also mean rewriting a piece of music, either solo or ensemble, for another instrument or other instruments than which it was originally intended. The Beethoven Symphonies by Franz Liszt are a good example. Transcription in this sense is sometimes called arrangement, although strictly speaking transcriptions are faithful adaptations, whereas arrangements change significant aspects of the original piece.

Further examples of music transcription include ethnomusicological notation of oral traditions of folk music, such as Béla Bartók‘s and Ralph Vaughan Williams‘ collections of the national folk music of Hungary and England respectively. The French composer Olivier Messiaen transcribed birdsong in the wild, and incorporated it into many of his compositions, for example his Catalogue d’oiseaux for solo piano. Transcription of this nature involves scale degree recognition and harmonic analysis, both of which the transcriber will need relative or perfect pitch to perform.

In popular music and rock, there are two forms of transcription. Individual performers copy a note-for-note guitar solo or other melodic line. As well, music publishers transcribe entire recordings of guitar solos and bass lines and sell the sheet music in bound books. Music publishers also publish PVG (piano/vocal/guitar) transcriptions of popular music, where the melody line is transcribed, and then the accompaniment on the recording is arranged as a piano part. The guitar aspect of the PVG label is achieved through guitar chords written above the melody. Lyrics are also included below the melody.

───

 

作過 MIR 筆記

Pitch Transcription Exercise

 

練習者,應當已知其所用分解、組合、評估方法矣。

Automatic music transcription

The term “automatic music transcription” was first used by audio researchers James A. Moorer, Martin Piszczalski, and Bernard Galler in 1977. With their knowledge of digital audio engineering, these researchers believed that a computer could be programmed to analyze a digital recording of music such that the pitches of melody lines and chord patterns could be detected, along with the rhythmic accents of percussion instruments. The task of automatic music transcription concerns two separate activities: making an analysis of a musical piece, and printing out a score from that analysis.[1]

This was not a simple goal, but one that would encourage academic research for at least another three decades. Because of the close scientific relationship of speech to music, much academic and commercial research that was directed toward the more financially resourced speech recognition technology would be recycled into research about music recognition technology. While many musicians and educators insist that manually doing transcriptions is a valuable exercise for developing musicians, the motivation for automatic music transcription remains the same as the motivation for sheet music: musicians who do not have intuitive transcription skills will search for sheet music or a chord chart, so that they may quickly learn how to play a song. A collection of tools created by this ongoing research could be of great aid to musicians. Since much recorded music does not have available sheet music, an automatic transcription device could also offer transcriptions that are otherwise unavailable in sheet music. To date, no software application can yet completely fulfill James Moorer’s definition of automatic music transcription. However, the pursuit of automatic music transcription has spawned the creation of many software applications which can aid in manual transcription. Some can slow down music while maintaining original pitch and octave, some can track the pitch of melodies, some can track the chord changes, and others can track the beat of music.

Automatic transcription most fundamentally involves identifying the pitch and duration of the performed notes. This entails tracking pitch and identifying note onsets. After capturing those physical measurements, this information is mapped into traditional music notation, i.e., the sheet music.

Digital Signal Processing is the branch of engineering that provides software engineers with the tools and algorithms needed to analyze a digital recording in terms of pitch (note detection of melodic instruments), and the energy content of un-pitched sounds (detection of percussion instruments). Musical recordings are sampled at a given recording rate and its frequency data is stored in any digital wave format in the computer. Such format represents sound by digital sampling.

 

因是想借 librosa 程式庫一探『言語』頻譜特徵呦。

Spectral features

chroma_stft([y, sr, S, norm, n_fft, …]) Compute a chromagram 可借 librosa 一探言語頻譜特徵也。from a waveform or power spectrogram.
chroma_cqt([y, sr, C, hop_length, fmin, …]) Constant-Q chromagram
chroma_cens([y, sr, C, hop_length, fmin, …]) Computes the chroma variant “Chroma Energy Normalized” (CENS), following [R15].
melspectrogram([y, sr, S, n_fft, …]) Compute a mel-scaled spectrogram.
mfcc([y, sr, S, n_mfcc]) Mel-frequency cepstral coefficients
rmse([y, S, frame_length, hop_length, …]) Compute root-mean-square (RMS) energy for each frame, either from the audio samples y or from a spectrogram S.
spectral_centroid([y, sr, S, n_fft, …]) Compute the spectral centroid.
spectral_bandwidth([y, sr, S, n_fft, …]) Compute p’th-order spectral bandwidth:
spectral_contrast([y, sr, S, n_fft, …]) Compute spectral contrast [R16]
spectral_rolloff([y, sr, S, n_fft, …]) Compute roll-off frequency
poly_features([y, sr, S, n_fft, hop_length, …]) Get coefficients of fitting an nth-order polynomial to the columns of a spectrogram.
tonnetz([y, sr, chroma]) Computes the tonal centroid features (tonnetz), following the method of [R17].
zero_crossing_rate(y[, frame_length, …]) Compute the zero-crossing rate of an audio time series.

 

且可玩味線性預測

Linear prediction

Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples.

In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed as a subset of filter theory. In system analysis (a subfield of mathematics), linear prediction can be viewed as a part of mathematical modelling or optimization.

The prediction model

The most common representation is

  {\widehat {x}}(n)=\sum _{i=1}^{p}a_{i}x(n-i)\,

where  {\widehat {x}}(n) is the predicted signal value, x(n-i) the previous observed values, and  a_{i} the predictor coefficients. The error generated by this estimate is

e(n)=x(n)-{\widehat {x}}(n)\,

where  x(n) is the true signal value.

These equations are valid for all types of (one-dimensional) linear prediction. The differences are found in the way the predictor coefficients  a_{i} are chosen.

For multi-dimensional signals the error metric is often defined as

  e(n)=\|x(n)-{\widehat {x}}(n)\|\,

where  \|\cdot \| is a suitable chosen vector norm. Predictions such as  {\widehat {x}}(n) are routinely used within Kalman filters and smoothers[1] to estimate current and past signal values, respectively.

 

之旨趣也◎