W!o+ 的《小伶鼬工坊演義》︰神經網絡【學而堯曰】五

若說想了解『熵』 Entropy 之內涵?

化學熱力學中所指的英語:Entropy[3],是一種測量在動力學方面不能做能量總 數,也就是當總體的熵增加,其做功能力也下降,熵的量度正是能量退化的指標。熵亦被用於計算一個系統中的失序現象,也就是計算該系統混亂的程度。熵是一個 描述系統狀態的函數,但是經常用熵的參考值和變化量進行分析比較,它在控制論、機率論、數論、天體物理、生命科學等領域都有重要應用,在不同的學科中也有 引申出的更為具體的定義,是各領域十分重要的參量。

190px-Ice_water

熔冰——増熵的古典例子[1] 1862年被魯道夫·克勞修斯描寫為冰塊中分子分散性的増加[2]

 

得先知道『對數』 Log 的性質,大概奇也怪哉!那位『納皮爾』之遠見淵源流長,或許絕非純屬意外!!??

一六一四年 John Napier 約翰‧納皮爾在一本名為《 Mirifici Logarithmorum Canonis Descriptio  》── 奇妙的對數規律的描述 ── 的書中,用了三十七頁解釋『對數log ,以及給了長達九十頁的對數表。這有什麼重要的嗎?想一想即使在今天用『鉛筆』和『紙』做大位數的加減乘除,尚且困難也很容易算錯,就可以知道對數的發明,對計算一事貢獻之大的了。如果用一對一對應的觀點來看,對數把『乘除』運算『變換加減』運算

\log {a * b} = \log{a} + \log{b}

\log {a / b} = \log{a} - \log{b}

,更不要說還可以算『平方』、『立方』種種和開『平方根』、『立方根』等等的計算了。

\log {a^n} = n * \log{a}

傳聞納皮爾還發明了的『骨頭計算器』,他的書對於之後的天文學 、力學、物理學、占星學的發展都有很大的影響。他的運算變換 Transform 的想法,開啟了『換個空間解決數學問題』的大門 ,比方『常微分方程式的  Laplace Transform』與『頻譜分析的傅立葉變換』等等。

這個對數畫起來是這個樣子︰

Rendered by QuickLaTeX.com

不只如此這個對數關係竟然還跟人類之『五官』── 眼耳鼻舌身 ── 受到『刺激』── 色聲香味觸 ── 的『感覺』強弱大小有關 。一七九五年出生的 Ernst Heinrich Weber 韋伯,一位德國物理學家,是一位心理物理學的先驅,他提出感覺之『方可分辨』JND just-noticeable difference 的特性。比方說你提了五公斤的水,再加上半公斤,可能感覺差不了多少,要是你沒提水,說不定會覺的突然拿著半公斤的水很重。也就是說在『既定的刺激』下, 感覺的方可分辨性大小並不相同。韋伯實驗後歸結成一個關係式︰

ΔR/R = K

R:  既有刺激之物理量數值
ΔR:  方可分辨 JND 所需增加的刺激之物理量數值
K: 特定感官之常數,不同的感官不同

。之後  Gustav Theodor Fechner  費希納,一位韋伯派的學者,提出『知覺』perception 『連續性假設,將韋伯關係式改寫為︰

dP = k  \frac {dS}{S}

,求解微分方程式得到︰

P = k \ln S + C

假如刺激之物理量數值小於 S_0 時,人感覺不到 P = 0,就可將上式寫成︰

P = k \ln \frac {S}{S_0}

這就是知名的韋伯-費希納定律,它講著:在絕對閾限 S_0 之上,主觀知覺之強度的變化與刺激之物理量大小的改變呈現自然對數的關係,也可以說,如果刺激大小按著幾何級數倍增,所引起的感覺強度卻只依造算術級數累加。

─── 摘自《千江有水千江月

 

這個『對數』的『大域』以及『微觀』性質可見諸於下︰

如果從『恆等式』identity 的『觀點』來看,『 泛函數方程式』可以看成是『泛函數恆等式』 functional identities,就像{[\sin{x}]}^2 + {[\cos{x}]}^2 = 1 這個 『三角恆等式』 一樣,假使我們藉由上式將 \sin{(x + y)} = \sin{(x)} \cos{(y)} + \cos{(x)} \sin{(y)} 恆等式改寫成 \sin{(x + y)} = \sin{(x)} \sqrt{1 - {[\sin{y}]}^2} + \sqrt{1 - {[\sin{x}]}^2} \sin{(y)},儼然是一個『 泛函數方程式』的了!因此我們也可以用『相同』的『觀點』將『微分方程式』看成是一種『泛函數恆等式』,進一步『明白』即使『不求解』那個方程式,我們依然能夠藉之得到有關『解函數』的許多重要有用的『資訊』的啊!!

之前我們曾用『均值定理

一個實數函數 f 在閉區間 [a, b] 裡『連續』且於開區間 [a, b] 中『可微分』,那麼一定存在一點 c, \ a < c < b 使得此點的『切線斜率』等於兩端點間的『割線斜率』,即 f^{\prime}(c) = \frac{f(b) - f(a)}{b - a}

論證了『劉維爾定理』。這個『均值定理』的重要性在於,它將一個『連續』而且『可微分』的『函數』的『區間端點割線』與『區間內切線』聯繫了起來,使我們可以『確定』一個『等式』的『存在』。就讓我們再舉一個『對數性函數f(x \cdot y) = f(x) + f(y) 的例子,看看它的『運用』 吧。首先 f(1) = f(1 \cdot 1) = f(1) + f(1) \Longrightarrow f(1) = 0,其次 f(x \cdot \frac{1}{x}) = f(1) = 0 = f(x) + f(\frac{1}{x}) \Longrightarrow f(\frac{1}{x}) = - f(x),所以 f(\frac{x}{y}) = f(x \cdot \frac{1}{y}) = f(x) + f(\frac{1}{y}) = f(x) -f(y)。因此

f(x + \delta x) - f(x) = f(\frac{x + \delta x}{\delta x}) = f(1 + \frac{\delta x}{ x})

= f^{\prime}(\eta) \left[(1 + \frac{\delta x}{x}) - 1 \right], \ \eta \in (1, 1 + \delta x)

= f^{\prime}(\eta) \frac{\delta x}{x}

,為什麼呢?因為 f(x) 在『閉區間[1, 1 + \delta x]是『平滑的』,按照『均值定理』,存在一個 \eta \in (1, 1+ \delta x) 使得

f^{\prime}(\eta) = \frac{f( 1 + \frac{\delta x}{ x}) - f(1)}{(1 + \frac{\delta x}{x})  - 1} = \frac{f( 1 + \frac{\delta x}{ x})}{ \frac{\delta x}{x}}

\therefore f(x + \delta x) = f(x) +  f^{\prime}(\eta) \frac{\delta x}{x} = f(x) + f^{\prime}(x) \cdot \delta x + \epsilon \cdot \delta x,於是我們可以得到

f^{\prime}(x) = \frac{f^{\prime}(\eta)}{x} - \epsilon,也就是說『函數f(x) 滿足

f^{\prime}(x) = \frac{k}{x} , \ f(1)= 0, \ k= f^{\prime}(1)

它的『』果真就是 f(x) = k \ln{(x)} 的啊!!

─── 摘自《【Sonic π】電聲學之電路學《四》之《一》

 

如是當知 ln(x) 之『導數』為 \frac{1}{x} 的乎?似乎宇宙中有著一『大數因緣』!!此所以 \bigcirc \cdot ln(\bigcirc) 『形式』頗為常見??如果畫一圖象,

Figure xlnx

 

怎麼瞧來像《形象的叛逆》之不是煙斗的『煙斗』??!!如何知那煙嘴不冒煙 \lim \limits_{x \to 0} x \log{x} = 0 ,但問『對數』之弟兄『指數』 exp 耶?!假設 x = e^{-t} ,故曉

\lim \limits_{x \to 0} x \log{x} =\lim \limits_{e^{-t} \to 0} e^{-t} \log \ e^{-t} = \lim \limits_{t \to \infty } \frac{- t}{e^t} = 0

矣!!!更別說它們還能將『階乘』一把抓???

斯特靈公式

斯特靈公式是一條用來取n階乘近似值數學公式。一般來說,當n很大的時候,n階乘的計算量十分大,所以斯特靈公式十分好用,而且,即使在n很小的時候,斯特靈公式的取值已經十分準確。

公式為:

n! \approx \sqrt{2\pi n}\, \left(\frac{n}{e}\right)^{n}.

這就是說,對於足夠大的整數n,這兩個數互為近似值。更加精確地:

\lim_{n \rightarrow \infty} {\frac{n!}{\sqrt{2\pi n}\, \left(\frac{n}{e}\right)^{n}}} = 1

\lim_{n \rightarrow \infty} {\frac{e^n\, n!}{n^n \sqrt{n}}} = \sqrt{2 \pi}.

……

推導

這個公式,以及誤差的估計,可以推導如下。我們不直接估計n!,而是考慮它的自然對數

\ln(n!) = \ln 1 + \ln 2 + \cdots + \ln n.

這個方程的右面是積分\int_1^n \ln(x)\,dx = n \ln n - n + 1的近似值(利用梯形法則),而它的誤差由歐拉-麥克勞林公式給出:

\ln (n!) - \frac{\ln n}{2} = \ln 1 + \ln 2 + \cdots + \ln(n-1) + \frac{\ln n}{2} = n \ln n - n + 1 + \sum_{k=2}^{m} \frac{B_k {(-1)}^k}{k(k-1)} \left( \frac{1}{n^{k-1}} - 1 \right) + R_{m,n},

其中Bk伯努利數Rm,n是歐拉-麥克勞林公式中的餘項。取極限,可得:

\lim_{n \to \infty} \left( \ln n! - n \ln n + n - \frac{\ln n}{2} \right) = 1 - \sum_{k=2}^{m} \frac{B_k {(-1)}^k}{k(k-1)} + \lim_{n \to \infty} R_{m,n}.

我們把這個極限記為y。由於歐拉-麥克勞林公式中的餘項Rm,n滿足:

R_{m,n} = \lim_{n \to \infty} R_{m,n} + O \left( \frac{1}{n^{2m-1}} \right),

其中我們用到了大O符號,與以上的方程結合,便得出對數形式的近似公式:

\ln n! = n \ln \left( \frac{n}{e} \right) + \frac{\ln n}{2} + y + \sum_{k=2}^{m} \frac{B_k {(-1)}^k}{k(k-1)n^{k-1}} + O \left( \frac{1}{n^{2m-1}} \right).

兩邊取指數,並選擇任何正整數m,我們便得到了一個含有未知數ey的公式。當m=1時,公式為:

n! = e^{y} \sqrt{n}~{\left( \frac{n}{e} \right)}^n \left[ 1 + O \left( \frac{1}{n} \right) \right]

將上述表達式代入沃利斯乘積公式,並令n趨於無窮,便可以得出eye^y = \sqrt{2 \pi})。因此,我們便得出斯特靈公式:

n! = \sqrt{2 \pi n}~{\left( \frac{n}{e} \right)}^n \left[ 1 + O \left( \frac{1}{n} \right) \right]

這個公式也可以反覆使用分部積分法來得出,首項可以通過最速下降法得到。把以下的和

\ln(n!) = \sum_{j=1}^{n} \ln j

用積分近似代替,可以得出不含\sqrt{2 \pi n}的因子的斯特靈公式(這個因子通常在實際應用中無關):

\sum_{j=1}^{n} \ln j \approx \int_1^n \ln x \, dx = n\ln n - n + 1.

───

 

故深得『統計力學』之鍾愛也︰

Maxwell–Boltzmann statistics

 In statistical mechanics, Maxwell–Boltzmann statistics describes the average distribution of non-interacting material particles over various energy states in thermal equilibrium, and is applicable when the temperature is high enough or the particle density is low enough to render quantum effects negligible.The expected number of particles with energy \varepsilon_i for Maxwell–Boltzmann statistics is \langle N_i \rangle where:

 \langle N_i \rangle = \frac {g_i} {e^{(\varepsilon_i-\mu)/kT}} = \frac{N}{Z}\,g_i e^{-\varepsilon_i/kT}

where:

  • \varepsilon_i is the i-th energy level
  • \langle N_i \rangle is the number of particles in the set of states with energy \varepsilon_i
  • g_i is the degeneracy of energy level i, that is, the number of states with energy \varepsilon_i which may nevertheless be distinguished from each other by some other means.[nb 1]
  • μ is the chemical potential
  • k is Boltzmann’s constant
  • T is absolute temperature
  • N is the total number of particles
N=\sum_i N_i\,
Z=\sum_i g_i e^{-\varepsilon_i/kT}

Equivalently, the particle number is sometimes expressed as

 \langle N_i \rangle = \frac {1} {e^{(\varepsilon_i-\mu)/kT}} = \frac{N}{Z}\,e^{-\varepsilon_i/kT}

where the index i  now specifies a particular state rather than the set of all states with energy \varepsilon_i, and Z=\sum_i e^{-\varepsilon_i/kT}

……

Derivation from microcanonical ensemble

Suppose we have a container with a huge number of very small particles all with identical physical characteristics (such as mass, charge, etc.). Let’s refer to this as the system. Assume that though the particles have identical properties, they are distinguishable. For example, we might identify each particle by continually observing their trajectories, or by placing a marking on each one, e.g., drawing a different number on each one as is done with lottery balls.

The particles are moving inside that container in all directions with great speed. Because the particles are speeding around, they possess some energy. The Maxwell–Boltzmann distribution is a mathematical function that speaks about how many particles in the container have a certain energy.

In general, there may be many particles with the same amount of energy \varepsilon. Let the number of particles with the same energy \varepsilon_1 be N_1, the number of particles possessing another energy \varepsilon_2 be N_2, and so forth for all the possible energies {\varepsilon_i | i=1,2,3,…}. To describe this situation, we say that N_i is the occupation number of the energy level i. If we know all the occupation numbers {N_i | i=1,2,3,…}, then we know the total energy of the system. However, because we can distinguish between which particles are occupying each energy level, the set of occupation numbers {N_i | i=1,2,3,…} does not completely describe the state of the system. To completely describe the state of the system, or the microstate, we must specify exactly which particles are in each energy level. Thus when we count the number of possible states of the system, we must count each and every microstate, and not just the possible sets of occupation numbers.

To begin with, let’s ignore the degeneracy problem: assume that there is only one way to put N_i particles into the energy level i . What follows next is a bit of combinatorial thinking which has little to do in accurately describing the reservoir of particles.

The number of different ways of performing an ordered selection of one single object from N objects is obviously N. The number of different ways of selecting two objects from N objects, in a particular order, is thus N(N − 1) and that of selecting n objects in a particular order is seen to be N!/(N − n)!. It is divided by the number of permutations, n!, if order does not matter. The binomial coefficient, N!/(n!(N − n)!), is, thus, the number of ways to pick n objects from N. If we now have a set of boxes labelled a, b, c, d, e, …, k, then the number of ways of selecting Na objects from a total of N objects and placing them in box a, then selecting Nb objects from the remaining N − Na objects and placing them in box b, then selecting Nc objects from the remaining N − Na − Nb objects and placing them in box c, and continuing until no object is left outside is

and because not even a single object is to be left outside the boxes, implies that the sum made of the terms Na, Nb, Nc, Nd, Ne, …, Nk must equal N, thus the term (N – Na – Nb – Nc – … – Nl – Nk)! in the relation above evaluates to 0!. (0!=1) which makes possible to write down that relation as

Now going back to the degeneracy problem which characterizes the reservoir of particles. If the i-th box has a “degeneracy” of g_i, that is, it has g_i “sub-boxes”, such that any way of filling the i-th box where the number in the sub-boxes is changed is a distinct way of filling the box, then the number of ways of filling the i-th box must be increased by the number of ways of distributing the N_i objects in the g_i “sub-boxes”. The number of ways of placing N_i distinguishable objects in g_i “sub-boxes” is g_i^{N_i} (the first object can go into any of the g_i boxes, the second object can also go into any of the g_i boxes, and so on). Thus the number of ways W that a total of N particles can be classified into energy levels according to their energies, while each level i having g_i distinct states such that the i-th level accommodates N_i particles is:

W=N!\prod \frac{g_i^{N_i}}{N_i!}

This is the form for W first derived by Boltzmann. Boltzmann’s fundamental equation S=k\,\ln W relates the thermodynamic entropy S to the number of microstates W, where k is the Boltzmann constant. It was pointed out by Gibbs however, that the above expression for W does not yield an extensive entropy, and is therefore faulty. This problem is known as the Gibbs paradox. The problem is that the particles considered by the above equation are not indistinguishable. In other words, for two particles (A and B) in two energy sublevels the population represented by [A,B] is considered distinct from the population [B,A] while for indistinguishable particles, they are not. If we carry out the argument for indistinguishable particles, we are led to the Bose–Einstein expression for W:

W=\prod_i \frac{(N_i+g_i-1)!}{N_i!(g_i-1)!}

The Maxwell–Boltzmann distribution follows from this Bose–Einstein distribution for temperatures well above absolute zero, implying that g_i\gg 1. The Maxwell–Boltzmann distribution also requires low density, implying that g_i\gg N_i. Under these conditions, we may use Stirling’s approximation for the factorial:

 N! \approx N^N e^{-N},

to write:

W\approx\prod_i \frac{(N_i+g_i)^{N_i+g_i}}{N_i^{N_i}g_i^{g_i}}\approx\prod_i \frac{g_i^{N_i}(1+N_i/g_i)^{g_i}}{N_i^{N_i}}

Using the fact that (1+N_i/g_i)^{g_i}\approx e^{N_i} for g_i\gg N_i we can again use Stirlings approximation to write:

W\approx\prod_i \frac{g_i^{N_i}}{N_i!}

This is essentially a division by N! of Boltzmann’s original expression for W, and this correction is referred to as correct Boltzmann counting.

We wish to find the N_i for which the function W is maximized, while considering the constraint that there is a fixed number of particles \left(N=\textstyle\sum N_i\right) and a fixed energy \left(E=\textstyle\sum N_i \varepsilon_i\right) in the container. The maxima of W and \ln(W) are achieved by the same values of N_i and, since it is easier to accomplish mathematically, we will maximize the latter function instead. We constrain our solution using Lagrange multipliers forming the function:

 f(N_1,N_2,\ldots,N_n)=\ln(W)+\alpha(N-\sum N_i)+\beta(E-\sum N_i \varepsilon_i)
 \ln W=\ln\left[\prod\limits_{i=1}^{n}\frac{g_i^{N_i}}{N_i!}\right] \approx \sum\limits_{i=1}^n\left(N_i\ln g_i-N_i\ln N_i + N_i\right)

Finally

 f(N_1,N_2,\ldots,N_n)=\alpha N +\beta E + \sum\limits_{i=1}^n\left(N_i\ln g_i-N_i\ln N_i + N_i-(\alpha+\beta\varepsilon_i) N_i\right)

In order to maximize the expression above we apply Fermat’s theorem (stationary points), according to which local extrema, if exist, must be at critical points (partial derivatives vanish):

 \frac{\partial f}{\partial N_i}=\ln g_i-\ln N_i -(\alpha+\beta\varepsilon_i) = 0

By solving the equations above (i=1\ldots n) we arrive to an expression for N_i:

 N_i = \frac{g_i}{e^{\alpha+\beta \varepsilon_i}}

Substituting this expression for N_i into the equation for \ln W and assuming that N\gg 1 yields:

\ln W = (\alpha+1) N+\beta E\,

or, rearranging:

E=\frac{\ln W}{\beta}-\frac{N}{\beta}-\frac{\alpha N}{\beta}

Boltzmann realized that this is just an expression of the Euler-integrated fundamental equation of thermodynamics. Identifying E as the internal energy, the Euler-integrated fundamental equation states that :

E=TS-PV+\mu N

where T is the temperature, P is pressure, V is volume, and μ is the chemical potential. Boltzmann’s famous equation S=k\,\ln W is the realization that the entropy is proportional to \ln W with the constant of proportionality being Boltzmann’s constant. Using the ideal gas equation of state (PV=NkT), It follows immediately that \beta=1/kT and \alpha=-\mu/kT so that the populations may now be written:

 N_i = \frac{g_i}{e^{(\varepsilon_i-\mu)/kT}}

Note that the above formula is sometimes written:

 N_i = \frac{g_i}{e^{\varepsilon_i/kT}/z}

where z=\exp(\mu/kT) is the absolute activity.

Alternatively, we may use the fact that

\sum_i N_i=N\,

to obtain the population numbers as

 N_i = N\frac{g_i e^{-\varepsilon_i/kT}}{Z}

where Z is the partition function defined by:

 Z = \sum_i g_i e^{-\varepsilon_i/kT}

In an approximation where εi is considered to be a continuous variable, the Thomas-Fermi approximation yields a continuous degeneracy g proportional to \sqrt{\varepsilon} so that:

 \frac{\sqrt{\varepsilon}\,e^{-\varepsilon/k T}}{\int_0^\infty\sqrt{\varepsilon}\,e^{-\varepsilon/k T}}

which is just the Maxwell-Boltzmann distribution for the energy.

───

 

『熵』之名義 S = k \ ln W 亦得而出焉。終因『貼標籤』問題

海盜船

忒修斯之船

希臘古羅馬時代的普魯塔克 Plutarch 引用古希臘傳說寫道︰

忒 修斯與雅典的年輕人們自克里特島歸來時,所搭之三十槳的船為雅典人留下來當做紀念碑。隨著時間流逝;木材逐漸腐朽,那時雅典人便會更換新的木頭來替代。終 於此船的每根木頭都已被替換過了;因此古希臘的哲學家們就開始問著:『這艘船還是原本的那艘忒修斯之船的嗎?假使是,但它已經沒有原本的任何一根木頭了; 如果不是,那它又是從什麼時候不是的呢?』

這個『同一性』identity 問題,在邏輯學上叫做『同一律』,與真假不相容的『矛盾律』齊名︰

\forall x, \ x = x

─── 摘自《Thue 之改寫系統《三》

 

導致了『吉布斯悖論』

Gibbs paradox

In statistical mechanics, a semi-classical derivation of the entropy that does not take into account the indistinguishability of particles, yields an expression for the entropy which is not extensive (is not proportional to the amount of substance in question). This leads to a paradox known as the Gibbs paradox, after Josiah Willard Gibbs. The paradox allows for the entropy of closed systems to decrease, violating the second law of thermodynamics. A related paradox is the “mixing paradox”. If one takes the perspective that the definition of entropy must be changed so as to ignore particle permutation, the paradox is averted.

───

 

,『歸因』之事能不慎乎!!!