W!o+ 的《小伶鼬工坊演義》︰神經網絡【FFT】六

人耳聽旋律,縱然不能分辨音高時值,卻知音符總有個先來後到。傳說莫札特絕對音感之好,即使是初聞的曲調,也能立刻譜出!!為求一次得到『兩個領域』── 時間、頻率 ── 的好處,於是人們設想了

短時距傅立葉變換

短時距傅立葉變換傅立葉變換的一種變形,為時頻分析中其中一個重要的工具。

傅立葉轉換在概念上的區別

將訊號做傅立葉變換後得到的結果,並不能給予關於訊號頻率隨時間改變的任何資訊。以下的例子作為說明:

x(t)=\begin{cases} \cos(440 \pi t); & t < 0.5 \\ \cos(660 \pi t); & 0.5 \leq t < 1 \\ \cos(524 \pi t); & t \ge 1 \end{cases}

傅立葉變換後的頻譜和短時距傅立葉轉換後的結果如下:

傅立葉轉換後, 橫軸為頻率(赫茲)

短時距傅立葉轉換, 橫軸為時間(秒),縱軸為頻率(赫茲)

由上圖可發現,傅立葉轉換只提供了有哪些頻率成份的資訊,卻沒有提供時間資訊;而短時傅立葉轉換則清楚的提供這兩種資訊。這種時頻分析的方法有利於頻率會隨著時間改變的訊號,如音樂訊號和語音訊號等分析。

定義

數學定義

簡單來說,在連續時間的例子,一個函數可以先乘上僅在一段時間不為零的窗函數再進行一維的傅立葉變換。再將這個窗函數沿著時間軸挪移,所得到一系列的傅立葉變換結果排開則成為二維表象。數學上,這樣的操作可寫為:

 X(t, f) = \int_{-\infty}^{\infty} w(t-\tau)x(\tau) e^{-j 2 \pi f \tau} \, d\tau

另外也可用角頻率來表示:

 X(t, \omega) = \int_{-\infty}^{\infty} w(t-\tau)x(\tau) e^{-j \omega \tau} \, d\tau

其中w(t)窗函數,窗函數種類有很多種,會在稍後再做仔細討論。x(t)是待變換的訊號。X(t,\omega)w(t-\tau)x(\tau)的傅立葉變換。 隨著t的改變,窗函數在時間軸上會有位移。經w(t-\tau)x(\tau)後,訊號只留下了窗函數截取的部分做最後的傅立葉轉換。

而反短時距傅立葉轉換,其數學類似傅立葉轉換,但須消除窗函數的作用:

 x(t)=w(t_1-t)^{-1} \int_{-\infty}^{\infty} X(t_1, f) e^{j 2 \pi f t}\, df ; w(t_1-t)\ne 0

窗函數

窗函數通常滿足下列特性:

  1. w(t) = w(-t) \,,即為偶函數。
  2. max(w(t))=w(0) \,,即窗函數的中央通常是最大值的位置。
  3. w(t_1)\ge w(t_2), |t_2| \ge |t_1|,即窗函數的值由中央開始向兩側單調遞減。
  4. w(t)\cong 0 , |t|\to \infty,即窗函數的值向兩側遞減為零。

常見的窗函數有:方形、三角形、高斯函數等,而短時距傅立葉轉換也因窗函數的不同而有不同的名稱。而加伯轉換,即為窗函數是高斯函數的短時距傅立葉轉換,通常沒有特別說明的短時距傅立葉轉換,即為加伯轉換

……

頻譜(Spectrogram)

Spectrogram即短時傅立葉轉換後結果的絕對值平方,兩者本質上是相同的,在文獻上也常出現spectrogram這個名詞。

SP_x(t,f) = |X(t,f)|^2 = | \int_{-\infty}^{\infty} w(t-\tau)x(\tau) e^{-j 2 \pi f \tau} \, d\tau |^2

───

 

誰知卻與『測不準原理』不期而遇︰

相對論將『觀察者』帶入物理,改變了『量測』的基本觀念。雖然無限精準的『測量』即使作不到,尚且還可以想像。但是量子力學把『量測』的『測不準』原理放進物理,就是說連想像『粒子』的『軌跡』在原理上都不『允許』!!量子力學是使用著『運算子』operator 的語言來描寫微小粒子之『事件概率』的『波函數』。那『測不準原理』是什麼呢?所謂『經典物理學』classic physics 對一個『物體』運動軌跡的描述是由它的『位置』和『速度』或說『動量』所確定的,一九二七年德國的維爾納‧海森堡 Werner Heisenberg  卻講任何『量子系統』之『量測』必為如下的關係式所制約︰

\Delta x \Delta p \ge  \hbar

\Delta t \Delta E \ge  \hbar

這並不是因為觀察者的量測,影響了系統── 比方說用粗大的溫度計量一小杯水的溫度 ──所導致的『觀察者效應』,而是宇宙的本質如此。所以即使是想像一個箱子裡的『電子軌跡』都沒有『旨趣』,你不量測它想說它是『波』或者是『粒子』之象純屬『無謂』。這引發一些物理學大方家不滿,認為量子力學根本尚未『完備』。就像發展完成量子力學『波動方程式』的埃爾溫‧薛丁格,他卻也是提出一個稱之為『薛丁格貓 』之想像實驗的人,用以表達目前量子力學之『哥本哈根詮釋』所必須思考的嚴峻性矛盾問題︰

300px-Schrodingers_cat.svg

薛丁格是如此描述這個實驗的:

實 驗者甚至可以設置出相當荒謬的案例來。把一隻貓關在一個封閉的鐵容器裡面,並且裝置以下儀器(注意必須保固 這儀器不被容器中的貓直接干擾):在一台蓋革計數器內置入極少量放射性物質,由於物質的數量極少,在一小時內,這個放射性物質至少有一個原子衰變的機率為 50%,它沒有任何原子衰變的機率也同樣為50%;假若衰變事件發生了,則蓋革計數管會放電,通過繼電器啟動一個榔頭,榔頭會打破裝有氰化氫的燒瓶。經過 一小時以後,假若沒有發生衰變事件,則貓仍舊存活;否則發生衰變,這套機構被觸發,氰化氫揮發,導致貓隨即死亡。用以描述整個事件的波函數竟然表達出了活 貓與死貓各半糾合在一起的狀態。
類似這典型案例的眾多案例裏,原本只局限於原子領域的不明確性被以一種巧妙的機制變為宏觀不明確性,只有通過打開這個箱子來直接觀察才能解除這樣的不明確 性。它使得我們難以如此天真地接受採用這種籠統的模型來正確代表實體的量子特性。就其本身的意義而言,它不會蘊含任何不清楚或矛盾的涵義。但是,在一張搖 晃或失焦的圖片與雲堆霧層的快照之間,實則有很大的不同之處。

不僅如此,在量子系統中,假使兩個粒子在經過短時間彼此間耦合之後,儘管將這兩個粒子分隔很遠的一段距離,量測其中任何一個粒子,會不可避免地影響到另外一個粒子的度量性質,彷彿有隔空的傳心術一般,這種關聯現象稱之為『量子糾纏 』quantum entanglement 。

─── 摘自《測不準原理

 

那個 \Delta t \Delta E \ge  \hbar 限制了 STFT 之『解析度』︰

Short-time Fourier transform

The short-time Fourier transform (STFT), or alternatively short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.[1] In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time.

Impact_STFT

Example of short time Fourier transforms used to determine time of impact from audio signal.

……

Resolution issues

Further information: Gabor limit

One of the pitfalls of the STFT is that it has a fixed resolution. The width of the windowing function relates to how the signal is represented—it determines whether there is good frequency resolution (frequency components close together can be separated) or good time resolution (the time at which frequencies change). A wide window gives better frequency resolution but poor time resolution. A narrower window gives good time resolution but poor frequency resolution. These are called narrowband and wideband transforms, respectively.

Comparison of STFT resolution. Left has better time resolution, and right has better frequency resolution.

This is one of the reasons for the creation of the wavelet transform and multiresolution analysis, which can give good time resolution for high-frequency events and good frequency resolution for low-frequency events, the combination best suited for many real signals.

This property is related to the Heisenberg uncertainty principle, but not directly – see Gabor limit for discussion. The product of the standard deviation in time and frequency is limited. The boundary of the uncertainty principle (best simultaneous resolution of both) is reached with a Gaussian window function, as the Gaussian minimizes the Fourier uncertainty principle. This is called the Gabor transform (and with modifications for multiresolution becomes the Morlet wavelet transform).

One can consider the STFT for varying window size as a two-dimensional domain (time and frequency), as illustrated in the example below, which can be calculated by varying the window size. However, this is no longer a strictly time–frequency representation – the kernel is not constant over the entire signal.

───

 

Fourier transform on Euclidean space

The Fourier transform can be defined in any arbitrary number of dimensions n. As with the one-dimensional case, there are many conventions. For an integrable function f(x), this article takes the definition:

\hat{f}(\boldsymbol{\xi}) = \mathcal{F}(f)(\boldsymbol{\xi}) = \int_{\R^n} f(\mathbf{x}) e^{-2\pi i \mathbf{x}\cdot\boldsymbol{\xi}} \, d\mathbf{x}

where x and ξ are n-dimensional vectors, and x·ξ is the dot product of the vectors. The dot product is sometimes written as \left\langle \mathbf x, \boldsymbol \xi \right\rangle.

All of the basic properties listed above hold for the n-dimensional Fourier transform, as do Plancherel’s and Parseval’s theorem. When the function is integrable, the Fourier transform is still uniformly continuous and the Riemann–Lebesgue lemma holds. (Stein & Weiss 1971)

Uncertainty principle

For more details on this topic, see Gabor limit.

Generally speaking, the more concentrated f(x) is, the more spread out its Fourier transform \hat f(\xi) must be. In particular, the scaling property of the Fourier transform may be seen as saying: if we “squeeze” a function in x, its Fourier transform “stretches out” in ξ. It is not possible to arbitrarily concentrate both a function and its Fourier transform.

The trade-off between the compaction of a function and its Fourier transform can be formalized in the form of an uncertainty principle by viewing a function and its Fourier transform as conjugate variables with respect to the symplectic form on the time–frequency domain: from the point of view of the linear canonical transformation, the Fourier transform is rotation by 90° in the time–frequency domain, and preserves the symplectic form.

Suppose f(x) is an integrable and square-integrable function. Without loss of generality, assume that f(x) is normalized:

\int_{-\infty}^\infty |f(x)|^2 \,dx=1.

It follows from the Plancherel theorem that \hat f(\xi) is also normalized.

The spread around x = 0 may be measured by the dispersion about zero (Pinsky 2002, p. 131) defined by

D_0(f)=\int_{-\infty}^\infty x^2|f(x)|^2\,dx.

In probability terms, this is the second moment of |f(x)|2 about zero.

The Uncertainty principle states that, if f(x) is absolutely continuous and the functions x·f(x) and f′(x) are square integrable, then

D_0(f)D_0(\hat{f}) \geq \frac{1}{16\pi^2}    (Pinsky 2002).

The equality is attained only in the case f(x)=C_1 \, e^{-\pi x^2/\sigma^2} (hence \hat{f}(\xi)= \sigma C_1 \, e^{-\pi\sigma^2\xi^2}) where σ > 0 is arbitrary and C_1 = \sqrt[4]{2} / \sqrt{\sigma} so that f is L2–normalized (Pinsky 2002). In other words, where f is a (normalized) Gaussian function with variance σ2, centered at zero, and its Fourier transform is a Gaussian function with variance σ−2.

In fact, this inequality implies that:

\left(\int_{-\infty}^\infty (x-x_0)^2|f(x)|^2\,dx\right)\left(\int_{-\infty}^\infty(\xi-\xi_0)^2|\hat{f}(\xi)|^2\,d\xi\right)\geq \frac{1}{16\pi^2}

for any x0, ξ0R  (Stein & Shakarchi 2003, p. 158).

In quantum mechanics, the momentum and position wave functions are Fourier transform pairs, to within a factor of Planck’s constant. With this constant properly taken into account, the inequality above becomes the statement of the Heisenberg uncertainty principle (Stein & Shakarchi 2003, p. 158).

A stronger uncertainty principle is the Hirschman uncertainty principle, which is expressed as:

H(|f|^2)+H(|\hat{f}|^2)\ge \log(e/2)

where H(p) is the differential entropy of the probability density function p(x):

H(p) = -\int_{-\infty}^\infty p(x)\log(p(x)) \, dx

where the logarithms may be in any base that is consistent. The equality is attained for a Gaussian, as in the previous case.

───

 

由於 numpy 中並沒有直接計算 STFT 的函式,有興趣的讀者或可以安裝 scipy

scipy.signal.spectrogram

scipy.signal.spectrogram(x, fs=1.0, window=(‘tukey’, 0.25), nperseg=256, noverlap=None, nfft=None, detrend=’constant’, return_onesided=True, scaling=’density’, axis=-1, mode=’psd’)
Compute a spectrogram with consecutive Fourier transforms.

Spectrograms can be used as a way of visualizing the change of a nonstationary signal’s frequency content over time.

Parameters:

x : array_like

Time series of measurement values

fs : float, optional

Sampling frequency of the x time series. Defaults to 1.0.

window : str or tuple or array_like, optional

Desired window to use. See get_window for a list of windows and required parameters. If window is array_like it will be used directly as the window and its length will be used for nperseg. Defaults to a Tukey window with shape parameter of 0.25.

nperseg : int, optional

Length of each segment. Defaults to 256.

noverlap : int, optional

Number of points to overlap between segments. If None, noverlap = nperseg // 8. Defaults to None.

nfft : int, optional

Length of the FFT used, if a zero padded FFT is desired. If None, the FFT length is nperseg. Defaults to None.

detrend : str or function or False, optional

Specifies how to detrend each segment. If detrend is a string, it is passed as the type argument to detrend. If it is a function, it takes a segment and returns a detrended segment. If detrend is False, no detrending is done. Defaults to ‘constant’.

return_onesided : bool, optional

If True, return a one-sided spectrum for real data. If False return a two-sided spectrum. Note that for complex data, a two-sided spectrum is always returned.

scaling : { ‘density’, ‘spectrum’ }, optional

Selects between computing the power spectral density (‘density’) where Pxx has units of V**2/Hz and computing the power spectrum (‘spectrum’) where Pxx has units of V**2, if x is measured in V and fs is measured in Hz. Defaults to ‘density’

axis : int, optional

Axis along which the spectrogram is computed; the default is over the last axis (i.e. axis=-1).

mode : str, optional

Defines what kind of return values are expected. Options are [‘psd’, ‘complex’, ‘magnitude’, ‘angle’, ‘phase’].

Returns:

f : ndarray

Array of sample frequencies.

t : ndarray

Array of segment times.

Sxx : ndarray

Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

See also

periodogram
Simple, optionally modified periodogram
lombscargle
Lomb-Scargle periodogram for unevenly sampled data
welch
Power spectral density by Welch’s method.
csd
Cross spectral density by Welch’s method.

Notes

An appropriate amount of overlap will depend on the choice of window and on your requirements. In contrast to welch’s method, where the entire data stream is averaged over, one may wish to use a smaller overlap (or perhaps none at all) when computing a spectrogram, to maintain some statistical independence between individual segments.

New in version 0.16.0.

───

 

玩玩想想的乎!!??