時間序列︰安斯庫姆四重奏

『眼見為憑』,明白統計數據間之局部與全局關係,此乃

安斯庫姆四重奏

安斯庫姆四重奏Anscombe’s quartet)是四組基本的統計特性一致的數據,但由它們繪製出的圖表則截然不同。每一組數據都包括了11個(x,y)點。這四組數據由統計學家弗朗西斯·安斯庫姆(Francis Anscombe)於1973年構造,他的目的是用來說明在分析數據前先繪製圖表的重要性,以及離群值對統計的影響之大。

安斯庫姆四重奏的四組數據圖表

這四組數據的共同統計特性如下:

性質 數值
x平均數 9
x方差 11
y的平均數 7.50(精確到小數點後兩位)
y的方差 4.122或4.127(精確到小數點後三位)
xy之間的相關係數 0.816(精確到小數點後三位)
線性回歸 {\displaystyle y=3.00+0.500x}(分別精確到小數點後兩位和三位)

在四幅圖中,由第一組數據繪製的圖表(左上圖)是看起來最「正常」的,可以看出兩個隨機變量之間的相關性。從第二組數據的圖表(右上圖)則可以明顯 地看出兩個隨機變量間的關係是非線性的。第三組中(左下圖),雖然存在著線性關係,但由於一個離群值的存在,改變了線性回歸線,也使得相關係數從1降至 0.81。最後,在第四個例子中(右下圖),儘管兩個隨機變量間沒有線性關係,但僅僅由於一個離群值的存在就使得相關係數變得很高。

愛德華·塔夫特(Edward Tufte)在他所著的《圖表設計的現代主義革命》(The Visual Display of Quantitative Information)一書的第一頁中,就使用安斯庫姆四重奏來說明繪製數據圖表的重要性。

四組數據的具體取值如下所示。其中前三組數據的x值都相同。

安斯庫姆四重奏
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

 

的主旋律乎?如是者將會知道概念間的關係及其先後次序之重要性吧!

Covariance

In probability theory and statistics, covariance is a measure of the joint variability of two random variables.[1] If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, i.e., the variables tend to show similar behavior, the covariance is positive.[2] For example, as a balloon is blown up it gets larger in all dimensions. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. If a sealed balloon is squashed in one dimension then it will expand in the other two. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which in addition to serving as a descriptor of the sample, also serves as an estimated value of the population parameter.

Definition

The covariance between two jointly distributed real-valued random variables X and Y with finite second moments is defined as[3]

\operatorname {cov} (X,Y)=\operatorname {E} {{\big [}(X-\operatorname {E} [X])(Y-\operatorname {E} [Y]){\big ]}},

where E[X] is the expected value of X, also known as the mean of X. The covariance is also sometimes denoted “σ”, in analogy to variance. By using the linearity property of expectations, this can be simplified to

{\displaystyle {\begin{aligned}\operatorname {cov} (X,Y)&=\operatorname {E} \left[\left(X-\operatorname {E} \left[X\right]\right)\left(Y-\operatorname {E} \left[Y\right]\right)\right]\\&=\operatorname {E} \left[XY-X\operatorname {E} \left[Y\right]-\operatorname {E} \left[X\right]Y+\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\right]\\&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]+\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\\&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right].\end{aligned}}}

However, when \operatorname {E} [XY]\approx \operatorname {E} [X]\operatorname {E} [Y], this last equation is prone to catastrophic cancellation when computed with floating point arithmetic and thus should be avoided in computer programs when the data has not been centered before.[4] Numerically stable algorithms should be preferred in this case.

 

Joint probability distribution

In the study of probability, given at least two random variables X, Y, …, that are defined on a probability space, the joint probability distribution for X, Y, … is a probability distribution that gives the probability that each of X, Y, … falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function (in the case of discrete variables). These in turn can be used to find two other types of distributions: the marginal distribution giving the probabilities for any one of the variables with no reference to any specific ranges of values for the other variables, and the conditional probability distribution giving the probabilities for any subset of the variables conditional on particular values of the remaining variables.

Many sample observations (black) are shown from a joint probability distribution. The marginal densities are shown as well.

Examples

Coin Flips

Consider the flip of two fair coins; let A and B be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays “heads” then associated random variable is 1, and is 0 otherwise. The joint probability density function of A and B defines probabilities for each pair of outcomes. All possible outcomes are

(A=0,B=0),(A=0,B=1),(A=1,B=0),(A=1,B=1)

Since each outcome is equally likely the joint probability density function becomes

P(A,B)=1/4

when A,B\in \{0,1\}. Since the coin flips are independent, the joint probability density function is the product of the marginals:

P(A,B)=P(A)P(B).

In general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution.

Dice Rolls

Consider the roll of a fair die and let A = 1 if the number is even (i.e. 2, 4, or 6) and A = 0 otherwise. Furthermore, let B = 1 if the number is prime (i.e. 2, 3, or 5) and B = 0 otherwise.

  1 2 3 4 5 6
A 0 1 0 1 0 1
B 0 1 1 0 1 0

Then, the joint distribution of A and B, expressed as a probability mass function, is

\mathrm {P} (A=0,B=0)=P\{1\}={\frac {1}{6}},\;\mathrm {P} (A=1,B=0)=P\{4,6\}={\frac {2}{6}},

\mathrm {P} (A=0,B=1)=P\{3,5\}={\frac {2}{6}},\;\mathrm {P} (A=1,B=1)=P\{2\}={\frac {1}{6}}.

These probabilities necessarily sum to 1, since the probability of some combination of A and B occurring is 1.

 

因此能掌握『統計相關』說何事?

Correlation and dependence

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or two sets of data. Correlation is any of a broad class of statistical relationships involving dependence, though in common usage it most often refers to the extent to which two variables have a linear relationship with each other. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling; however, correlation is not sufficient to demonstrate the presence of such a causal relationship (i.e., correlation does not imply causation).

Formally, dependence refers to any situation in which random variables do not satisfy a mathematical condition of probabilistic independence. In loose usage, correlation can refer to any departure of two or more random variables from independence, but technically it refers to any of several more specialized types of relationship between mean values. There are several correlation coefficients, often denoted ρ or r, measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may exist even if one is a nonlinear function of the other). Other correlation coefficients have been developed to be more robust than the Pearson correlation – that is, more sensitive to nonlinear relationships.[1][2][3] Mutual information can also be applied to measure dependence between two variables.

Pearson’s product-moment coefficient

The most familiar measure of dependence between two quantities is the Pearson product-moment correlation coefficient, or “Pearson’s correlation coefficient”, commonly called simply “the correlation coefficient”. It is obtained by dividing the covariance of the two variables by the product of their standard deviations. Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.[4]

The population correlation coefficient ρX,Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is defined as:

\rho _{X,Y}=\mathrm {corr} (X,Y)={\mathrm {cov} (X,Y) \over \sigma _{X}\sigma _{Y}}={E[(X-\mu _{X})(Y-\mu _{Y})] \over \sigma _{X}\sigma _{Y}},

where E is the expected value operator, cov means covariance, and corr is a widely used alternative notation for the correlation coefficient.

The Pearson correlation is defined only if both of the standard deviations are finite and nonzero. It is a corollary of the Cauchy–Schwarz inequality that the correlation cannot exceed 1 in absolute value. The correlation coefficient is symmetric: corr(X,Y) = corr(Y,X).

The Pearson correlation is +1 in the case of a perfect direct (increasing) linear relationship (correlation), −1 in the case of a perfect decreasing (inverse) linear relationship (anticorrelation),[5] and some value in the open interval (−1, 1) in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables.

If the variables are independent, Pearson’s correlation coefficient is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables. For example, suppose the random variable X is symmetrically distributed about zero, and Y = X2. Then Y is completely determined by X, so that X and Y are perfectly dependent, but their correlation is zero; they are uncorrelated. However, in the special case when X and Y are jointly normal, uncorrelatedness is equivalent to independence.

If we have a series of n measurements of X and Y written as xi and yi for i = 1, 2, …, n, then the sample correlation coefficient can be used to estimate the population Pearson correlation r between X and Y. The sample correlation coefficient is written:

{\displaystyle r_{xy}={\frac {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{(n-1)s_{x}s_{y}}}={\frac {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sqrt {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}\sum \limits _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}},}

where x and y are the sample means of X and Y, and sx and sy are the sample standard deviations of X and Y.

This can also be written as:

{\displaystyle {\begin{aligned}r_{xy}&={\frac {\sum x_{i}y_{i}-n{\bar {x}}{\bar {y}}}{ns_{x}s_{y}}}\\&={\frac {n\sum x_{i}y_{i}-\sum x_{i}\sum y_{i}}{{\sqrt {n\sum x_{i}^{2}-(\sum x_{i})^{2}}}~{\sqrt {n\sum y_{i}^{2}-(\sum y_{i})^{2}}}}}.\end{aligned}}}

If x and y are results of measurements that contain measurement error, the realistic limits on the correlation coefficient are not −1 to +1 but a smaller range.[6]

For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r, Pearson’s product-moment coefficient.

 

通達時間序列『自相關』之計算耶??

Autocorrelation

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Unit root processes, trend stationary processes, autoregressive processes, and moving average processes are specific forms of processes with autocorrelation.

Definitions

Different fields of study define autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with autocovariance.

Statistics

In statistics, the autocorrelation of a random process is the correlation between values of the process at different times, as a function of the two times or of the time lag. Let X be a stochastic process, and t be any point in time. (t may be an integer for a discrete-time process or a real number for a continuous-time process.) Then Xt is the value (or realization) produced by a given run of the process at time t. Suppose that the process has mean μt and variance σt2 at time t, for each t. Then the definition of the autocorrelation between times s and t is

R(s,t)={\frac {\operatorname {E} [(X_{t}-\mu _{t})(X_{s}-\mu _{s})]}{\sigma _{t}\sigma _{s}}}\,,

where “E” is the expected value operator. Note that this expression is not well-defined for all-time series or processes, because the mean may not exist, or the variance may be zero (for a constant process) or infinite (for processes with distribution lacking well-behaved moments, such as certain types of power law). If the function R is well-defined, its value must lie in the range [−1, 1], with 1 indicating perfect correlation and −1 indicating perfect anti-correlation.

If Xt is a wide-sense stationary process then the mean μ and the variance σ2 are time-independent, and further the autocorrelation depends only on the lag between t and s: the correlation depends only on the time-distance between the pair of values but not on their position in time. This further implies that the autocorrelation can be expressed as a function of the time-lag, and that this would be an even function of the lag τ = s − t. This gives the more familiar form

  R(\tau )={\frac {\operatorname {E} [(X_{t}-\mu )(X_{t+\tau }-\mu )]}{\sigma ^{2}}},\,

and the fact that this is an even function can be stated as

R(\tau )=R(-\tau ).\,

It is common practice in some disciplines, other than statistics and time series analysis, to drop the normalization by σ2 and use the term “autocorrelation” interchangeably with “autocovariance”. However, the normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations.

 

庶幾免於相關、因果間誤謬也!!

Correlation does not imply causation

Correlation does not imply causation” is a phrase used in statistics to emphasize that a correlation between two variables does not imply that one causes the other.[1][2] Many statistical tests calculate correlation between variables. A few go further, using correlation as a basis for testing a hypothesis of a true causal relationship; examples are the Granger causality test and convergent cross mapping.[clarification needed]

The counter-assumption, that “correlation proves causation,” is considered a questionable cause logical fallacy in that two events occurring together are taken to have a cause-and-effect relationship. This fallacy is also known as cum hoc ergo propter hoc, Latin for “with this, therefore because of this,” and “false cause.” A similar fallacy, that an event that follows another was necessarily a consequence of the first event, is sometimes described as post hoc ergo propter hoc (Latin for “after this, therefore because of this.”).

For example, in a widely studied case, numerous epidemiological studies showed that women taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But randomized controlled trials showed that HRT caused a small but statistically significant increase in risk of CHD. Re-analysis of the data from the epidemiological studies showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better-than-average diet and exercise regimens. The use of HRT and decreased incidence of coronary heart disease were coincident effects of a common cause (i.e. the benefits associated with a higher socioeconomic status), rather than a direct cause and effect, as had been supposed.[3]

As with any logical fallacy, identifying that the reasoning behind an argument is flawed does not imply that the resulting conclusion is false. In the instance above, if the trials had found that hormone replacement therapy does in fact have a negative incidence on the likelihood of coronary heart disease the assumption of causality would have been correct, although the logic behind the assumption would still have been flawed.

 

 

 

 

 

 

 

 

 

時間序列︰當波動遇上耗散

巨觀世界之穩定性,熱力學的有效性,促使玻爾茲曼運用物理運動定律 、統計方法,思考巨量微觀粒子之構成系統,企圖從中推導出整體熱力學。這個歷史進程中,有件困擾玻爾茲曼的事在於對稱性的破壞︰怎麼可能每個粒子都有時間對稱性,那個構成系統的整體卻沒有的呢?

Teeter-totter
展示時間對稱性蹺蹺板
它最後將導向何方??

物理系統的『時間對稱性』T-symmetry,是說『物理定律』在『時間反向變換』time reversal transformation T: t \mapsto -t 下保持不變。比方說『牛頓第二運動定律\vec{F} = m \frac{d}{d[-t]} \frac{d}{d[-t]}  \vec{r} = m \frac{d}{dt} \frac{d}{dt}  \vec{r} 具有『時間對稱性』。假使一個粒子從『初始態(\vec{r_i} , \vec{p_i}) 沿著軌跡往『終止態(\vec{r_f} , \vec{p_f}) 運動,如果『時間逆流』,此粒子將逆向由『終止態(\vec{r_f} , \vec{p_f}) 沿著軌跡向『初始態(\vec{r_i} , \vec{p_i}) 運動。

於是在『理化系統』中,就有了『微觀可逆性』原理︰

Corresponding to every individual process there is a reverse process, and in a state of equilibrium the average rate of every process is equal to the average rate of its reverse process.

。然而這個『微觀可逆性』原理,到了『巨觀世界』後,卻是與『熱力學』的『最大熵』 Maximum entropy 理論衝突。一八七二年時,玻爾茲曼提出了『 H 理論』︰

H(t) = \int \limits_0^{\infty} f(E,t) \left[ \log\left(\frac{f(E,t)}{\sqrt{E}}\right) - 1 \right] \, dE ,此處 f(E,t) 就是在 t 時間的『能量分布』函數,而那個 f(E,t) dE 是『動能』在 EE+dE 間之『粒子數』。據聞,玻爾茲曼是想用著『統計力學』的辦法,能夠推導出『最大熵S 的『不可逆性』。

Translational_motion

310px-Maxwell's_demon.svg

馬克士威妖

可以如是描述成︰假使一個絕熱容器被分成兩塊,中間有『』所控制之『』,那個容器中的『粒子』到處亂撞時,總會碰到『』上,此『』喜歡將『快‧慢』之『粒子』分別為『兩半』,因此,其中的一半就會比另外一半的『溫度』要高。

由於更早五年前,『馬克士威』設想了一個『想像實驗』︰

… if we conceive of a being whose faculties are so sharpened that he can follow every molecule in its course, such a being, whose attributes are as essentially finite as our own, would be able to do what is impossible to us. For we have seen that molecules in a vessel full of air at uniform temperature are moving with velocities by no means uniform, though the mean velocity of any great number of them, arbitrarily selected, is almost exactly uniform. Now let us suppose that such a vessel is divided into two portions, A and B, by a division in which there is a small hole, and that a being, who can see the individual molecules, opens and closes this hole, so as to allow only the swifter molecules to pass from A to B, and only the slower molecules to pass from B to A. He will thus, without expenditure of work, raise the temperature of B and lower that of A, in contradiction to the second law of thermodynamics.

也就有人『Johann Loschmidt』反對玻爾茲曼的『 H 理論』之說法︰if there is a motion of a system from time t0 to time t1 to time t2 that leads to a steady decrease of H (increase of entropy) with time, then there is another allowed state of motion of the system at t1, found by reversing all the velocities, in which H must increase. This revealed that one of Boltzmann’s key assumptions, molecular chaos, or, the Stosszahlansatz, that all particle velocities were completely uncorrelated, did not follow from Newtonian dynamics.

─── 摘自《物理哲學·下

 

難到『詳細平衡』之基本原理有錯誤??

Detailed balance

The principle of detailed balance is formulated for kinetic systems which are decomposed into elementary processes (collisions, or steps, or elementary reactions): At equilibrium, each elementary process should be equilibrated by its reverse process.

History

The principle of detailed balance was explicitly introduced for collisions by Ludwig Boltzmann. In 1872, he proved his H-theorem using this principle.[1] The arguments in favor of this property are founded upon microscopic reversibility.[2]

Five years before Boltzmann, James Clerk Maxwell used the principle of detailed balance for gas kinetics with the reference to the principle of sufficient reason.[3] He compared the idea of detailed balance with other types of balancing (like cyclic balance) and found that “Now it is impossible to assign a reason” why detailed balance should be rejected (pg. 64).

Albert Einstein in 1916 used the principle of detailed balance in a background for his quantum theory of emission and absorption of radiation.[4]

In 1901, Rudolf Wegscheider introduced the principle of detailed balance for chemical kinetics.[5] In particular, he demonstrated that the irreversible cycles {\displaystyle {\ce {A1->A2->\cdots ->A_{\mathit {n}}->A1}}} are impossible and found explicitly the relations between kinetic constants that follow from the principle of detailed balance. In 1931, Lars Onsager used these relations in his works,[6] for which he was awarded the 1968 Nobel Prize in Chemistry.

The principle of detailed balance has been used in Markov chain Monte Carlo methods since their invention in 1953.[7][8] In particular, in the Metropolis–Hastings algorithm and in its important particular case, Gibbs sampling, it is used as a simple and reliable condition to provide the desirable equilibrium state.

Now, the principle of detailed balance is a standard part of the university courses in statistical mechanics, physical chemistry, chemical and physical kinetics.[9][10][11]

Microscopical background

The microscopic “reversing of time” turns at the kinetic level into the “reversing of arrows”: the elementary processes transform into their reverse processes. For example, the reaction

{\displaystyle \sum _{i}\alpha _{i}{\ce {A}}_{i}{\ce {->}}\sum _{j}\beta _{j}{\ce {B}}_{j}} transforms into ∑ j β j B {\displaystyle \sum _{j}\beta _{j}{\ce {B}}_{j}{\ce {->}}\sum _{i}\alpha _{i}{\ce {A}}_{i}}

and conversely. (Here,  {\displaystyle {\ce {A}}_{i},{\ce {B}}_{j}} are symbols of components or states,  \alpha _{i},\beta _{j}\geq 0 are coefficients). The equilibrium ensemble should be invariant with respect to this transformation because of microreversibility and the uniqueness of thermodynamic equilibrium. This leads us immediately to the concept of detailed balance: each process is equilibrated by its reverse process.

This reasoning is based on three assumptions:

  1.   {\displaystyle {\ce {A}}_{i}} does not change under time reversal;
  2. Equilibrium is invariant under time reversal;
  3. The macroscopic elementary processes are microscopically distinguishable. That is, they represent disjoint sets of microscopic events.

Any of these assumptions may be violated.[12] For example, Boltzmann’s collision can be represented as {\displaystyle {\ce {{A_{\mathit {v}}}+A_{\mathit {w}}->{A_{\mathit {v'}}}+A_{\mathit {w'}},}}} where  {\displaystyle {\ce {A}}_{v}} is a particle with velocity v. Under time reversal  {\displaystyle {\ce {A}}_{v}} transforms into  {\displaystyle {\ce {A}}_{-v}}. Therefore, the collision is transformed into the reverse collision by the PT transformation, where P is the space inversion and T is the time reversal. Detailed balance for Boltzmann’s equation requires PT-invariance of collisions’ dynamics, not just T-invariance. Indeed, after the time reversal the collision {\displaystyle {\ce {{A_{\mathit {v}}}+A_{\mathit {w}}->{A_{\mathit {v'}}}+A_{\mathit {w'}},}}} transforms into {\displaystyle {\ce {{A_{\mathit {-v'}}}+A_{\mathit {-w'}}->{A_{\mathit {-v}}}+A_{\mathit {-w}}.}}} For the detailed balance we need transformation into {\displaystyle {\ce {{A_{\mathit {v'}}}+A_{\mathit {w'}}->{A_{\mathit {v}}}+A_{\mathit {w}}.}}} For this purpose, we need to apply additionally the space reversal P. Therefore, for the detailed balance in Boltzmann’s equation not T-invariance but PT-invariance is needed.

Equilibrium may be not T– or PT-invariant even if the laws of motion are invariant. This non-invariance may be caused by the spontaneous symmetry breaking. There exist nonreciprocal media (for example, some bi-isotropic materials) without T and PT invariance.[12]

If different macroscopic processes are sampled from the same elementary microscopic events then macroscopic detailed balance[clarification needed] may be violated even when microscopic detailed balance holds.[12][13]

Now, after almost 150 years of development, the scope of validity and the violations of detailed balance in kinetics seem to be clear.

……

Detailed balance and entropy increase

For many systems of physical and chemical kinetics, detailed balance provides sufficient conditions for the strict increase of entropy in isolated systems. For example, the famous Boltzmann H-theorem[1] states that, according to the Boltzmann equation, the principle of detailed balance implies positivity of entropy production. The Boltzmann formula (1872) for entropy production in rarefied gas kinetics with detailed balance[1][2] served as a prototype of many similar formulas for dissipation in mass action kinetics[15] and generalized mass action kinetics[16] with detailed balance.

Nevertheless, the principle of detailed balance is not necessary for entropy growth. For example, in the linear irreversible cycle {\displaystyle {\ce {A1->A2->A3->A1}}}, entropy production is positive but the principle of detailed balance does not hold.

Thus, the principle of detailed balance is a sufficient but not necessary condition for entropy increase in Boltzmann kinetics. These relations between the principle of detailed balance and the second law of thermodynamics were clarified in 1887 when Hendrik Lorentz objected to the Boltzmann H-theorem for polyatomic gases.[17] Lorentz stated that the principle of detailed balance is not applicable to collisions of polyatomic molecules.

Boltzmann immediately invented a new, more general condition sufficient for entropy growth.[18] Boltzmann’s condition holds for all Markov processes, irrespective of time-reversibility. Later, entropy increase was proved for all Markov processes by a direct method.[19][20] These theorems may be considered as simplifications of the Boltzmann result. Later, this condition was referred to as the “cyclic balance” condition (because it holds for irreversible cycles) or the “semi-detailed balance” or the “complex balance”. In 1981, Carlo Cercignani and Maria Lampis proved that the Lorentz arguments were wrong and the principle of detailed balance is valid for polyatomic molecules.[21] Nevertheless, the extended semi-detailed balance conditions invented by Boltzmann in this discussion remain the remarkable generalization of the detailed balance.

───

 

也許該說的是詳細平衡原理有假設前提。自然界裡事實存在著反例 !就像也有

自發對稱破缺

自發對稱破缺(spontaneous symmetry breaking)是某些物理系統實現對稱性破缺的模式。當物理系統所遵守的自然定律具有某種對稱性,而物理系統本身並不具有這種對稱性,則稱此現象為自發對稱破缺。[1]:141[2]:125這是一種自發性過程(spontaneous process),由於這過程,本來具有這種對稱性的物理系統,最終變得不再具有這種對稱性,或不再表現出這種對稱性,因此這種對稱性被隱藏。因為自發對稱破缺,有些物理系統的運動方程式拉格朗日量遵守這種對稱性,但是最低能量解答不具有這種對稱性。從描述物理現象的拉格朗日量或運動方程式,可以對於這現象做分析研究。

對稱性破缺主要分為自發對稱破缺與明顯對稱性破缺兩種。假若在物理系統的拉格朗日量裏存在著一個或多個違反某種對稱性的項目,因此導致系統的物理行為不具備這種對稱性,則稱此為明顯對稱性破缺

如右圖所示,假設在墨西哥帽(sombrero)的帽頂有一個圓球。這個圓球是處於旋轉對稱性狀態,對於繞著帽子中心軸的旋轉,圓球的位置不變。這圓球也處於局部最大重力勢的 狀態,極不穩定,稍加微擾,就可以促使圓球滾落至帽子谷底的任意位置,因此降低至最小重力勢位置,使得旋轉對稱性被打破。儘管這圓球在帽子谷底的所有可能 位置因旋轉對稱性而相互關聯,圓球實際實現的帽子谷底位置不具有旋轉對稱性──對於繞著帽子中心軸的旋轉,圓球的位置會改變。[3]:203

大多數物質的簡單相態相變,例如晶體磁鐵、一般超導體等等,可以從自發對稱破缺的觀點來了解。像分數量子霍爾效應(fractional quantum Hall effect)一類的拓撲相(topological phase)物質是值得注意的例外。[4]

墨西哥帽位能函數的電腦繪圖,對於繞著帽子中心軸的旋轉,帽頂具有旋轉對稱性,帽子谷底的任意位置不具有旋轉對稱性,在帽子谷底的任意位置會出現對稱性破缺。

 

的事一樣!!如是能否理解波動‧耗散定理耶☆

Fluctuation-dissipation theorem

The fluctuation-dissipation theorem (FDT) is a powerful tool in statistical physics for predicting the behavior of systems that obey detailed balance. Given that a system obeys detailed balance, the theorem is a general proof that thermal fluctuations in a physical variable predict the response quantified by the admittance or impedance of the same physical variable, and vice versa. The fluctuation-dissipation theorem applies both to classical and quantum mechanical systems.

The fluctuation-dissipation theorem relies on the assumption that the response of a system in thermodynamic equilibrium to a small applied force is the same as its response to a spontaneous fluctuation. Therefore, the theorem connects the linear response relaxation of a system from a prepared non-equilibrium state to its statistical fluctuation properties in equilibrium.[1] Often the linear response takes the form of one or more exponential decays.

The fluctuation-dissipation theorem was originally formulated by Harry Nyquist in 1928,[2] and later proven by Herbert Callen and Theodore A. Welton in 1951.[3]

Qualitative overview and examples

The fluctuation-dissipation theorem says that when there is a process that dissipates energy, turning it into heat (e.g., friction), there is a reverse process related to thermal fluctuations. This is best understood by considering some examples:

If an object is moving through a fluid, it experiences drag (air resistance or fluid resistance). Drag dissipates kinetic energy, turning it into heat. The corresponding fluctuation is Brownian motion. An object in a fluid does not sit still, but rather moves around with a small and rapidly-changing velocity, as molecules in the fluid bump into it. Brownian motion converts heat energy into kinetic energy—the reverse of drag.
If electric current is running through a wire loop with a resistor in it, the current will rapidly go to zero because of the resistance. Resistance dissipates electrical energy, turning it into heat (Joule heating). The corresponding fluctuation is Johnson noise. A wire loop with a resistor in it does not actually have zero current, it has a small and rapidly-fluctuating current caused by the thermal fluctuations of the electrons and atoms in the resistor. Johnson noise converts heat energy into electrical energy—the reverse of resistance.
When light impinges on an object, some fraction of the light is absorbed, making the object hotter. In this way, light absorption turns light energy into heat. The corresponding fluctuation is thermal radiation (e.g., the glow of a “red hot” object). Thermal radiation turns heat energy into light energy—the reverse of light absorption. Indeed, Kirchhoff’s law of thermal radiation confirms that the more effectively an object absorbs light, the more thermal radiation it emits.

Examples in detail

The fluctuation-dissipation theorem is a general result of statistical thermodynamics that quantifies the relation between the fluctuations in a system at thermal equilibrium and the response of the system to applied perturbations.

The model thus allows, for example, the use of molecular models to predict material properties in the context of linear response theory. The theorem assumes that applied perturbations, e.g., mechanical forces or electric fields, are weak enough that rates of relaxation remain unchanged.

Brownian motion

For example, Albert Einstein noted in his 1905 paper on Brownian motion that the same random forces that cause the erratic motion of a particle in Brownian motion would also cause drag if the particle were pulled through the fluid. In other words, the fluctuation of the particle at rest has the same origin as the dissipative frictional force one must do work against, if one tries to perturb the system in a particular direction.

From this observation Einstein was able to use statistical mechanics to derive the Einstein-Smoluchowski relation

   D = {\mu \, k_B T}

which connects the diffusion constant D and the particle mobility μ, the ratio of the particle’s terminal drift velocity to an applied force. kB is the Boltzmann constant, and T is the absolute temperature.

Thermal noise in a resistor

In 1928, John B. Johnson discovered and Harry Nyquist explained Johnson–Nyquist noise. With no applied current, the mean-square voltage depends on the resistance R k_BT, and the bandwidth  \Delta\nu over which the voltage is measured:

   \langle V^2 \rangle = 4Rk_BT\,\Delta\nu.

 

 

 

 

 

 

 

 

 

時間序列︰阿涅西的女巫

如何深入了解一個重要的定律︰

大數定律

數學統計學中,大數定律又稱大數法則、大數律,是描述相當多次數重複實驗的結果的定律。根據這個定律知道,樣本數量越多 ,則其平均就越趨近期望值

大數定律很重要,因為它「保證」了一些隨機事件的均值的長期穩定性。人們發現,在重複試驗中,隨著試驗次數的增加,事件發生的頻率趨於一個穩定值;人們同時也發現,在對物理量的測量實踐中,測定值的算術平均也具有穩定性。比如,我們向上拋一枚硬幣 ,硬幣落下後哪一面朝上本來是偶然的,但當我們上拋硬幣的次數足夠多後,達到上萬次甚至幾十萬幾百萬次以後,我們就會發現,硬幣每一面向上的次數約占總次數的二分之一。偶然之中包含著必然。

切比雪夫定理的一個特殊情況、辛欽定理伯努利大數定律都概括了這一現象,都稱為大數定律。

表現形式

大數定律主要有兩種表現形式:弱大數定律強大數定律。定律的兩種形式都肯定無疑地表明,樣本均值

{\overline {X}}_{n}={\frac {1}{n}}(X_{1}+\cdots +X_{n})

收斂於真值

{\overline {X}}_{n}\,\to \,\mu \qquad {\textrm {for}}\qquad n\to \infty ,

其中 X1, X2, … 是獨立同分布的,期望值 E(X1) = E(X2) = …= µ 的,勒貝格可積的隨機變量構成的無窮序列。Xj 的勒貝格可積性意味著期望值 E(Xj) 存在且有限。

方差 Var(X1) = Var(X2) = … = σ2 < ∞ 有限的假設是非必要的。很大或者無窮大的方差會使其收斂得緩慢一些,但大數定律仍然成立。通常採用這個假設來使證明更加簡潔。

強和弱之間的差別在所斷言的收斂的方式。對於這些方式的解釋,參見隨機變量的收斂

 

有時候不只要知道它的推導過程︰

Proof of the weak law

Given X1, X2, … an infinite sequence of i.i.d. random variables with finite expected value E(X1) = E(X2) = … = µ < ∞, we are interested in the convergence of the sample average

{\overline {X}}_{n}={\tfrac {1}{n}}(X_{1}+\cdots +X_{n}).

The weak law of large numbers states:

Theorem: {\displaystyle {\begin{matrix}{}\\{\overline {X}}_{n}\ {\xrightarrow {P}}\ \mu \qquad {\textrm {when}}\ n\to \infty .\\{}\end{matrix}}}
     
 
(law. 2)

Proof using Chebyshev’s inequality

This proof uses the assumption of finite variance  \operatorname {Var} (X_{i})=\sigma ^{2} (for all  i). The independence of the random variables implies no correlation between them, and we have that

\operatorname {Var} ({\overline {X}}_{n})=\operatorname {Var} ({\tfrac {1}{n}}(X_{1}+\cdots +X_{n}))={\frac {1}{n^{2}}}\operatorname {Var} (X_{1}+\cdots +X_{n})={\frac {n\sigma ^{2}}{n^{2}}}={\frac {\sigma ^{2}}{n}}.

The common mean μ of the sequence is the mean of the sample average:

E({\overline {X}}_{n})=\mu .

Using Chebyshev’s inequality on  {\overline {X}}_{n} results in

\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\leq {\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.

This may be used to obtain the following:

\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|<\varepsilon )=1-\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\geq 1-{\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.

As n approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

{\displaystyle {\begin{matrix}{}\\{\overline {X}}_{n}\ {\xrightarrow {P}}\ \mu \qquad {\textrm {when}}\ n\to \infty .\\{}\end{matrix}}}
 

 

還要能知道有沒有反例︰

柯西分布

柯西分布也叫作柯西-勞侖茲分布,它是以奧古斯丁·路易·柯西亨德里克·勞侖茲名字命名的連續機率分布,其機率密度函數

 f(x; x_0,\gamma) = \frac{1}{\pi\gamma \left[1 + \left(\frac{x-x_0}{\gamma}\right)^2\right]} \!
= { 1 \over \pi } \left[ { \gamma \over (x - x_0)^2 + \gamma^2 } \right] \!

其中x0是定義分布峰值位置的位置參數γ是最大值一半處的一半寬度的尺度參數

作為機率分布,通常叫作柯西分布物理學家也將之稱為勞侖茲分布或者Breit-Wigner分布。在物理學中的重要性很大一部分歸因於它是描述受迫共振微分方程式的解。在光譜學中,它描述了被共振或者其它機制加寬的譜線形狀。在下面的部分將使用柯西分布這個統計學術語。

x0 = 0且γ = 1的特例稱為標準柯西分布,其機率密度函數為

   f(x; 0,1) = \frac{1}{\pi (1 + x^2)}. \!

特性

其累積分布函數為:

F(x; x_0,\gamma)=\frac{1}{\pi} \arctan\left(\frac{x-x_0}{\gamma}\right)+\frac{1}{2}

柯西分布的逆累積分布函數為

F^{-1}(p; x_0,\gamma) = x_0 + \gamma\,\tan(\pi\,(p-1/2)). \!

柯西分布的平均值變異數或者都沒有定義,它的眾數中值有定義都等於 x0

X 表示柯西分布隨機變量,柯西分布的特性函數表示為:

\phi_x(t; x_0,\gamma) = \mathrm{E}(e^{i\,X\,t}) = \exp(i\,x_0\,t-\gamma\,|t|). \!

如果 UV期望值為 0、變異數為 1 的兩個獨立常態分布隨機變量的話,那麼比值 U/V 為柯西分布。

標準柯西分布是學生t-分布自由度為1的特殊情況。

柯西分布是穩定分布:如果  X\sim\textrm{Stable}(1,0,\gamma,\mu),則 X\sim\textrm{Cauchy}(\mu,\gamma)

如果 X1, …, Xn 是分別符合柯西分布的相互獨立同分布隨機變量,那麼算術平均數X1 + … + Xn)/n 有同樣的柯西分布。為了證明這一點,我們來計算採樣平均的特性函數

\phi_{\overline{X}}(t) = \mathrm{E}\left(e^{i\,\overline{X}\,t}\right) \,\!

其中,  \overline{X} 是採樣平均值。這個例子表明不能捨棄中央極限定理中的有限變量假設。

洛侖茲線性分布更適合於那種比較扁、寬的曲線 高斯線性分布則適合較高、較窄的曲線 當然,如果是比較居中的情況,兩者都可以。 很多情況下,採用的是兩者各占一定比例的做法。如勞侖次占60%,高斯占40%.

 

Lorentzian function Imaginary part Maple complex 3D plot

 

Imaginary plot of Lorentzian function (Maple animation)

 

清楚明白違背的理由︰

Explanation of undefined moments

Mean

If a probability distribution has a density function f(x), then the mean is

\int_{-\infty}^\infty x f(x)\,dx. \qquad\qquad (1)\!

The question is now whether this is the same thing as

\int_a^\infty x f(x)\,dx+\int_{-\infty}^a x f(x)\,dx.\qquad\qquad (2) \!

for an arbitrary real number a.

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the Cauchy distribution, both the positive and negative terms of (2) are infinite. Hence (1) is undefined.[12]

Note that the Cauchy principal value of the Cauchy distribution is:

  \lim_{a\to\infty}\int_{-a}^a x f(x)\,dx, \!

which is zero, while:

  \lim_{a\to\infty}\int_{-2a}^a x f(x)\,dx, \!

is not zero, as can be seen easily by computing the integral.

Various results in probability theory about expected values, such as the strong law of large numbers, will not work in such cases.[12]

Higher moments

The Cauchy distribution does not have finite moments of any order. Some of the higher raw moments do exist and have a value of infinity, for example the raw second moment:

{\begin{aligned}{\mathrm {E}}[X^{2}]&\propto \int _{{-\infty }}^{\infty }{\frac {x^{2}}{1+x^{2}}}\,dx=\int _{{-\infty }}^{\infty }1-{\frac {1}{1+x^{2}}}\,dx\\[8pt]&=\int _{{-\infty }}^{\infty }dx-\int _{{-\infty }}^{\infty }{\frac {1}{1+x^{2}}}\,dx=\int _{{-\infty }}^{\infty }dx-\pi =\infty .\end{aligned}}

By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1). Higher even-powered raw moments will also evaluate to infinity. Odd-powered raw moments, however, are undefined, which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to \infty -\infty since the two halves of the integral both diverge and have opposite signs. The first raw moment is the mean, which, being odd, does not exist. (See also the discussion above about this.) This in turn means that all of the central moments and standardized moments are undefined, since they are all based on the mean. The variance—which is the second central moment—is likewise non-existent (despite the fact that the raw second moment exists with the value infinity).

The results for higher moments follow from Hölder’s inequality, which implies that higher moments (or halves of moments) diverge if lower ones do.

 

且試著追溯它的歷史︰

箕舌線

箕舌線是平面曲線的一種,也被稱為阿涅西的女巫英語:The Witch of Agnesi[1][2][3]

給定一個圓和圓上的一點O。對於圓上的任何其它點A,作割線OA。設M是O的對稱點。OA與M的切線相交於N。過N且與OM平行的直線,與過A且與OM垂直的直線相交於P。則P的軌跡就是箕舌線。

箕舌線有一條漸近線,它是上述給定圓過O點的切線。

方程

設O是原點,M在正的y軸上。假設圓的半徑是a

則曲線的方程為  {\displaystyle y={\frac {8a^{3}}{x^{2}+4a^{2}}}}

注意如果a=1/2,則曲線化為最簡單的形式: {\displaystyle y={\frac {1}{x^{2}+1}}.}

如果  \theta \,是OM與OA的夾角,則曲線的參數方程為:

{\displaystyle x=2a\tan \theta ,\ y=2a\cos ^{2}\theta .\,}

如果  \theta \,是OA與x軸的夾角,則曲線的參數方程為:

{\displaystyle x=2a\cot \theta ,\ y=2a\sin ^{2}\theta .\,}

箕舌線

歷史

皮埃爾·德·費馬曾在1630年研究這條曲線。1703年時格蘭迪提出了建構這條曲線的方法。1718年時格蘭迪建議將這條曲線命名為versoria,意思是張帆的繩子,並將這條曲線的義大利文名稱命名為versiera[4]

1748年時瑪利亞·阿涅西出版了著名的著作《Instituzioni analitiche ad uso della gioventù italiana》,其中箕舌線仍沿用格蘭迪取的名稱versiera[4],一恰好當時的義大利文Aversiera/Versiera是衍生自拉丁文的Adversarius,是魔鬼的一個稱呼「與神為敵的」,和女巫是同義詞[5]。也許因為這個原因,劍橋教授 約翰·科爾森就誤譯了這條曲線。許多近代有關阿涅西及此曲線的著作對於誤譯的原因有些不同的猜測[6][7][8]斯特洛伊克認為:

versiera這個字是衍生自拉丁文的vertere,但後者也是義大利文avversiera(女魔鬼)的縮寫。英格蘭有些聰敏者將之翻譯成女巫(英語:witch),而這好笑的雙關語仍存於多數的英文教材裡。在費馬的著作(Oeuvres, I, 279-280; III, 233-234)就已經出現這條曲線,其名稱versiera是格蘭迪取的,在牛頓的曲線分類中,它是第63類……第一個使用女巫來描述這條曲線的可能是威廉森在1875年的《Integral calculus》中首次使用[9]

另一方面,史蒂芬·史蒂格勒認為是格蘭迪自已在玩文字遊戲[10]

應用

箕舌線除了其理論的性質外.也常出現在現實生活中.不過這次應用是在20世紀末期及21世紀才有足夠的了解。在為一些物體現象建立數學模型時,會出現箕舌線[11]。 此方程式近似光線及X光的譜線分布,也是共振電路中的能量耗散量。

箕舌線和柯西分布機率密度函數有相同的形式。

光滑小山嶽的截面也類似箕舌線。在數學建模中已用箕舌線作為一種流場的障礙物[12][13]

 

確定論理之條件︰

馬爾可夫不等式

機率論中,馬爾可夫不等式給出了隨機變量的函數大於等於某正數的機率的上界。雖然它以俄國數學家安德雷·馬爾可夫命名,但該不等式曾出現在一些更早的文獻中,其中包括馬爾可夫的老師–巴夫尼提·列波維奇·切比雪夫

馬爾可夫不等式把機率關聯到數學期望,給出了隨機變量的累積分布函數一個寬泛但仍有用的界。

馬爾可夫不等式的一個應用是,不超過1/5的人口會有超過5倍於人均收入的收入。

馬爾可夫不等式提供了  f(x)超過某特定數值  \epsilon (圖中標示紅色線處)機率的上界,其上界包括了特定數值  f的平均值

表達式

X為一非負隨機變量,則

  {\displaystyle \mathrm {P} (X\geq a)\leq {\frac {\mathrm {E} (X)}{a}}.}[1]

若用測度領域的術語來表示,馬爾可夫不等式可表示為若(X, Σ, μ)是一個測度空間,ƒ可測擴展實數的函數,且  \epsilon \geq 0,則

\mu (\{x\in X:|f(x)|\geq \epsilon \})\leq {1 \over \epsilon }\int _{X}|f|\,d\mu .

有時上述的不等式會被稱為切比雪夫不等式[2]

對於單調增加函數的擴展版本

φ是定義在非負實數上的單調增加函數,且其值非負,X是一個隨機變量,a ≥ 0,且φ(a) > 0,則

  {\displaystyle \mathbb {P} (|X|\geq a)\leq {\frac {\mathbb {E} (\varphi (|X|))}{\varphi (a)}}}

用來推論切比雪夫不等式

切比雪夫不等式使用變異數來作為一隨機變數超過平均值機率的上限,可以用下式表示:

  \Pr(|X-{\textrm {E}}(X)|\geq a)\leq {\frac {{\textrm {Var}}(X)}{a^{2}}},

對任意a>0,Var(X)為X的變異數,定義如下:

\operatorname {Var}(X)=\operatorname {E}[(X-\operatorname {E}(X))^{2}].

若以馬爾可夫不等式為基礎,切比雪夫不等式可視為考慮以下隨機變數

(X-\operatorname {E}(X))^{2}

根據馬爾可夫不等式,可得到以下的結果

\Pr((X-\operatorname {E}(X))^{2}\geq a^{2})\leq {\frac {\operatorname {Var}(X)}{a^{2}}},

 

 

 

 

 

 

 

 

 

時間序列︰高爾頓梅花機

想了解時間序列

X_t = \alpha + \beta t + white \ noise

之『趨勢』者,怎麼能不曉得

迴歸分析

迴歸分析英語:Regression Analysis)是一種統計學上 分析數據的方法,目的在於了解兩個或多個變數間是否相關、相關方向與強度,並建立數學模型以便觀察特定變數來預測研究者感興趣的變數 。更具體的來說,迴歸 分析可以幫助人們了解在只有一個自變量變化時因變量的變化量。一般來說,通過迴歸分析我們可以由給出的自變量估計因變量的條件期望。

迴歸分析是建立因變數  Y(或稱依變數,反應變數)與自變數  X(或稱獨變數,解釋變數)之間關係的模型。簡單線性回歸使用一個自變量  X複迴歸使用超過一個自變量( X_{1},X_{2}...X_{i})。

起源

迴歸的最早形式是最小平方法,由1805年的勒壤得(Legendre)[1],和1809年的高斯(Gauss)出版[2]。勒壤得和高斯都將該方法應用於從天文觀測中確定關於太陽的物體的軌道(主要是彗星,但後來是新發現的小行星)的問題。 高斯在1821年發表了最小平方理論的進一步發展[3],包括高斯-馬可夫定理的一個版本。

「迴歸」(或作「回歸」)一詞最早由法蘭西斯·高爾頓(Francis Galton)所使用[4][5]。他曾對親子間的身高做研究,發現父母的身高雖然會遺傳給子女,但子女的身高卻有逐漸「迴歸到中等(即人的平均值)」的現象。不過當時的迴歸和現在的迴歸在意義上已不盡相同。

在1950年代和60年代,經濟學家使用機械電子桌面計算器來計算迴歸。在1970年之前,它有時需要長達24小時從一個迴歸接收結果[6]

 

呢?想知道『迴歸』一詞的意義,還得先知道

高爾頓板

高爾頓板英語:Galton board),又稱為豆機bean machine)、梅花quincunx)等,是弗朗西斯·高爾頓發明的用以驗證中心極限定理的裝置。[1]

高爾頓板為一塊豎直放置的板,上面有交錯排列的釘子。讓小球從板的上端自由下落,當其碰到釘子後會隨機向左或向右落下。最終,小球會落至板底端的某一格子中。假設板上共有n排釘子,每個小球撞擊釘子後向右落下的機率為p(當左、右機率相同時p為0.5),則小球落入第k個格子機率為二項分布 {\displaystyle {n \choose k}p^{k}(1-p)^{n-k}}。根據中心極限定理,當n足夠大時,該分布近似於常態分布。此時,將大量小球落至格中,格子中的小球數量即近似於常態分布的鐘形曲線[2]

高爾頓繪製的高爾頓板示意圖

 

!明白在『大數法則』下,圖七之『二項分布』近似於『常態分布 』。將能了解『回歸平均』的旨趣

Regression toward the mean

In statistics, regression toward (or to) the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement—and if it is extreme on its second measurement, it will tend to have been closer to the average on its first.[1][2][3] To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data.[4]

The conditions under which regression toward the mean occurs depend on the way the term is mathematically defined. Sir Francis Galton first observed the phenomenon in the context of simple linear regression of data points. Galton[5] developed the following model: pellets fall through a quincunx forming a normal distribution centered directly under their entrance point. These pellets could then be released down into a second gallery corresponding to a second measurement occasion. Galton then asked the reverse question, “From where did these pellets come?”

“The answer was not on average directly above. Rather it was on average, more towards the middle, for the simple reason that there were more pellets above it towards the middle that could wander left than there were in the left extreme that could wander to the right, inwards” (p 477)[6]

A less restrictive approach is possible. Regression towards the mean can be defined for any bivariate distribution with identical marginal distributions. Two such definitions exist.[7] One definition accords closely with the common usage of the term “regression towards the mean”. Not all such bivariate distributions show regression towards the mean under this definition. However, all such bivariate distributions show regression towards the mean under the other definition.

Historically, what is now called regression toward the mean has also been called reversion to the mean and reversion to mediocrity.

In finance, the term mean reversion has a different meaning. Jeremy Siegel uses it to describe a financial time series in which “returns can be very unstable in the short run but very stable in the long run.” More quantitatively, it is one in which the standard deviation of average annual returns declines faster than the inverse of the holding period, implying that the process is not a random walk, but that periods of lower returns are systematically followed by compensating periods of higher returns, in seasonal businesses for example.[8]

……

Definition for simple linear regression of data points

This is the definition of regression toward the mean that closely follows Sir Francis Galton‘s original usage.[9]

Suppose there are n data points {yi, xi}, where i = 1, 2, …, n. We want to find the equation of the regression line, i.e. the straight line

y=\alpha +\beta x,\,

which would provide a “best” fit for the data points. (Note that a straight line may not be the appropriate regression curve for the given data points.) Here the “best” will be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers α and β solve the following minimization problem:

Find  \min _{\alpha ,\,\beta }Q(\alpha ,\beta ), where Q(\alpha ,\beta )=\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{\,2}=\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})^{2}\

Using calculus it can be shown that the values of α and β that minimize the objective function Q are

{\begin{aligned}&{\hat {\beta }}={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}={\frac {{\overline {xy}}-{\bar {x}}{\bar {y}}}{{\overline {x^{2}}}-{\bar {x}}^{2}}}={\frac {\operatorname {Cov} [x,y]}{\operatorname {Var} [x]}}=r_{xy}{\frac {s_{y}}{s_{x}}},\\&{\hat {\alpha }}={\bar {y}}-{\hat {\beta }}\,{\bar {x}},\end{aligned}}

where rxy is the sample correlation coefficient between x and y, sx is the standard deviation of x, and sy is correspondingly the standard deviation of y. Horizontal bar over a variable means the sample average of that variable. For example:  {\overline {xy}}={\tfrac {1}{n}}\textstyle \sum _{i=1}^{n}x_{i}y_{i}\ .

Substituting the above expressions for  {\hat {\alpha }} and  {\hat {\beta }} into  y=\alpha +\beta x,\, yields fitted values

  {\hat {y}}={\hat {\alpha }}+{\hat {\beta }}x,\,

which yields

{\frac {{\hat {y}}-{\bar {y}}}{s_{y}}}=r_{xy}{\frac {x-{\bar {x}}}{s_{x}}}

This shows the role rxy plays in the regression line of standardized data points.

If −1 < rxy < 1, then we say that the data points exhibit regression toward the mean. In other words, if linear regression is the appropriate model for a set of data points whose sample correlation coefficient is not perfect, then there is regression toward the mean. The predicted (or fitted) standardized value of y is closer to its mean than the standardized value of x is to its mean.

───

 

,探討圖八之意涵耶?!

 The convolution theorem and its applications

What is a convolution?

One of the most important concepts in Fourier theory, and in crystallography, is that of a convolution. Convolutions arise in many guises, as will be shown below. Because of a mathematical property of the Fourier transform, referred to as the convolution theorem, it is convenient to carry out calculations involving convolutions.

But first we should define what a convolution is. Understanding the concept of a convolution operation is more important than understanding a proof of the convolution theorem, but it may be more difficult!

Mathematically, a convolution is defined as the integral over all space of one function at x times another function at u-x. The integration is taken over the variable x (which may be a 1D or 3D variable), typically from minus infinity to infinity over all the dimensions. So the convolution is a function of a new variable u, as shown in the following equations. The cross in a circle is used to indicate the convolution operation.

convolution equals integral of one function at x times other function at u-x

Note that it doesn’t matter which function you take first, i.e. the convolution operation is commutative. We’ll prove that below, but you should think about this in terms of the illustration below. This illustration shows how you can think about the convolution, as giving a weighted sum of shifted copies of one function: the weights are given by the function value of the second function at the shift vector. The top pair of graphs shows the original functions. The next three pairs of graphs show (on the left) the function g shifted by various values of x and, on the right, that shifted function g multiplied by f at the value of x.

illustration of convolution

The bottom pair of graphs shows, on the left, the superposition of several weighted and shifted copies of g and, on the right, the integral (i.e. the sum of all the weighted, shifted copies of g). You can see that the biggest contribution comes from the copy shifted by 3, i.e. the position of the peak of f.

If one of the functions is unimodal (has one peak), as in this illustration, the other function will be shifted by a vector equivalent to the position of the peak, and smeared out by an amount that depends on how sharp the peak is. But alternatively we could switch the roles of the two functions, and we would see that the bimodal function g has doubled the peaks of the unimodal function f.

─── 摘自《勇闖新世界︰ W!o《卡夫卡村》變形祭︰品味科學‧教具教材‧【專題】 PD‧箱子世界‧摺積

 

 

 

 

 

 

 

 

 

 

輕。鬆。學。部落客