時間序列︰生成函數《一》

莊子‧天下

惠施以此為大觀於天下而曉辯者,天下之辯者相與樂之。卵有毛,雞三足,郢有天下,犬可以為羊,馬有卵,丁子有尾,火不熱,山出口,輪不蹍地,目不見,指不至,至不絕,龜長於蛇,矩不方,規不可以為圓,鑿不圍枘,飛鳥之景未嘗動也,鏃矢之疾而有不行不止之時,狗非犬,黃馬、驪牛三,白狗黑,孤駒未嘗有母,一尺之捶,日取其半,萬世不竭。辯者以此與惠施相應,終身無窮。

一尺之捶,日取其半,萬世不竭說何事?果真

\frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + \cdots \ = \ 1 耶??

 

 

此事由來太古早︰

Geometric series

In mathematics, a geometric series is a series with a constant ratio between successive terms. For example, the series

{\frac {1}{2}}\,+\,{\frac {1}{4}}\,+\,{\frac {1}{8}}\,+\,{\frac {1}{16}}\,+\,\cdots

is geometric, because each successive term can be obtained by multiplying the previous term by 1/2.

Geometric series are among the simplest examples of infinite series with finite sums, although not all of them have this property. Historically, geometric series played an important role in the early development of calculus, and they continue to be central in the study of convergence of series. Geometric series are used throughout mathematics, and they have important applications in physics, engineering, biology, economics, computer science, queueing theory, and finance.

 
Each of the purple squares has 1/4 of the area of the next larger square (1/2×1/2 = 1/4, 1/4×1/4 = 1/16, etc.). The sum of the areas of the purple squares is one third of the area of the large square.

 

故而難知其乃生成函數之重要關鍵︰

Ordinary generating functions

Polynomials are a special case of ordinary generating functions, corresponding to finite sequences, or equivalently sequences that vanish after a certain point. These are important in that many finite sequences can usefully be interpreted as generating functions, such as the Poincaré polynomial and others.

A key generating function is that of the constant sequence 1, 1, 1, 1, 1, 1, 1, 1, 1, …, whose ordinary generating function is

  \sum_{n=0}^{\infty}x^n = \frac{1}{1-x}.

The left-hand side is the Maclaurin series expansion of the right-hand side. Alternatively, the right-hand side expression can be justified by multiplying the power series on the left by 1 − x, and checking that the result is the constant power series 1, in other words that all coefficients except the one of x0 vanish. Moreover, there can be no other power series with this property. The left-hand side therefore designates the multiplicative inverse of 1 − x in the ring of power series.

Expressions for the ordinary generating function of other sequences are easily derived from this one. For instance, the substitution x → ax gives the generating function for the geometric sequence 1, a, a2, a3, … for any constant a:

  \sum_{n=0}^{\infty}(ax)^n= \frac{1}{1-ax}.

(The equality also follows directly from the fact that the left-hand side is the Maclaurin series expansion of the right-hand side.) In particular,

  \sum_{n=0}^{\infty}(-1)^nx^n= \frac{1}{1+x}.

One can also introduce regular “gaps” in the sequence by replacing x by some power of x, so for instance for the sequence 1, 0, 1, 0, 1, 0, 1, 0, …. one gets the generating function

  \sum_{n=0}^{\infty}x^{2n}=\frac{1}{1-x^2}.

By squaring the initial generating function, or by finding the derivative of both sides with respect to x and making a change of running variable n → n-1, one sees that the coefficients form the sequence 1, 2, 3, 4, 5, …, so one has

\sum_{n=0}^{\infty}(n+1)x^n= \frac{1}{(1-x)^2},

and the third power has as coefficients the triangular numbers 1, 3, 6, 10, 15, 21, … whose term n is the binomial coefficient  \tbinom{n+2}2, so that

\sum_{n=0}^{\infty}\binom{n+2}2 x^n= \frac{1}{(1-x)^3}.

More generally, for any non-negative integer k and non-zero real value a, it is true that

  \sum _{{n=0}}^{{\infty }}a^{n}{\binom {n+k}k}x^{n}={\frac {1}{(1-ax)^{{k+1}}}}\,.

Note that, since

  2\binom{n+2}2 - 3\binom{n+1}1 + \binom{n}0= 2\frac{(n+1)(n+2)}2 -3(n+1) + 1 = n^2,

one can find the ordinary generating function for the sequence 0, 1, 4, 9, 16, … of square numbers by linear combination of binomial-coefficient generating sequences;  G(n^2;x)=\sum_{n=0}^{\infty}n^2x^n=\frac{2}{(1-x)^3}-\frac{3}{(1-x)^2}+\frac{1}{1-x}=\frac{x(x+1)}{(1-x)^3}.

 

茲舉一例開宗明義,假使

    \[A = \sum_{n=0}^{\infty} a_n x^n\]

    \[S = \sum_{n=0}^{\infty} \left( \sum_{k=0}^{n} a_k \right) x^n\]

那麼

    \[S = A \cdot \frac{1}{1-x}\]

。所以欲求有限等比級數 1 + a + a^2 + \cdots + a^n 之和,可用 \frac{1}{1 - a x}  \cdot \frac{1}{1-x} 法為之。由於

\frac{1}{1 - a x}  \cdot \frac{1}{1-x} = \frac{1}{a-1} \left( \frac{a}{1 - a x} - \frac{1}{1-x} \right)

,故得 1 + a + a^2 + \cdots + a^n = \frac{1}{a - 1} \left( a \cdot a^n - 1 \right)  = \frac{a^n -1}{a -1}

 

 

 

 

 

 

 

 

 

時間序列︰形式冪級數之生成與緣起

A generating function is a device somewhat similar to a bag. Instead of carrying many little objects detachedly, which could be embarrassing, we put them all in a bag, and then we have only one object to carry, the bag.
George Polya, Mathematics and plausible reasoning (1954)

 

波利亞在其大作《Mathematics and plausible reasoning》寫到

The name “generating function” is due to Laplace. Yet, without giving it a name, Euler used the device of generating functions long before Laplace [..]. He applied this mathematical tool to several problems in Combinatory Analysis and the Theory of Numbers.

。指出『生成函數』

Generating function

In mathematics, the term generating function is used to describe an infinite sequence of numbers (an) by treating them as the coefficients of a series expansion. The sum of this infinite series is the generating function. Unlike an ordinary series, this formal series is allowed to diverge, meaning that the generating function is not always a true function and the “variable” is actually an indeterminate. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general linear recurrence problem.[1] One can generalize to formal series in more than one indeterminate, to encode information about arrays of numbers indexed by several natural numbers.

There are various types of generating functions, including ordinary generating functions, exponential generating functions, Lambert series, Bell series, and Dirichlet series; definitions and examples are given below. Every sequence in principle has a generating function of each type (except that Lambert and Dirichlet series require indices to start at 1 rather than 0), but the ease with which they can be handled may differ considerably. The particular generating function, if any, that is most useful in a given context will depend upon the nature of the sequence and the details of the problem being addressed.

Generating functions are often expressed in closed form (rather than as a series), by some expression involving operations defined for formal series. These expressions in terms of the indeterminate x may involve arithmetic operations, differentiation with respect to x and composition with (i.e., substitution into) other generating functions; since these operations are also defined for functions, the result looks like a function of x. Indeed, the closed form expression can often be interpreted as a function that can be evaluated at (sufficiently small) concrete values of x, and which has the formal series as its series expansion; this explains the designation “generating functions”. However such interpretation is not required to be possible, because formal series are not required to give a convergent series when a nonzero numeric value is substituted for x. Also, not all expressions that are meaningful as functions of x are meaningful as expressions designating formal series; for example, negative and fractional powers of x are examples of functions that do not have a corresponding formal power series.

Generating functions are not functions in the formal sense of a mapping from a domain to a codomain. Generating functions are sometimes called generating series,[2] in that a series of terms can be said to be the generator of its sequence of term coefficients.

 

命名的由來。他還說歐拉早就使用此『母函數』法研究數論︰

巴塞爾問題』是一個著名的『數論問題』,最早由『皮耶特羅‧門戈利』在一六四四年所提出。由於這個問題難倒了以前許多的數學家,因此一七三五年,當『歐拉』一解出這個問題後,他馬上就出名了,當時『歐拉』二十八歲。他把這個問題作了一番推廣,他的想法後來被『黎曼』在一八五九年的論文《論小於給定大數的質數個 數》 On the Number of Primes Less Than a Given Magnitude中所採用,論文中定義了『黎曼ζ函數』,並證明了它的一些基本的性質。那麼為什麼今天稱之為『巴塞爾問題』的呢?因為『此處』這個『巴塞爾』,它正是『歐拉』和『伯努利』之家族的『家鄉』。那麼就這麽樣的一個『級數的和\sum \limits_{n=1}^\infty \frac{1}{n^2} = \lim \limits_{n \to +\infty}\left(\frac{1}{1^2} + \frac{1}{2^2} + \cdots + \frac{1}{n^2}\right) 能有什麼『重要性』的嗎?即使僅依據『發散級數』 divergent series 的『可加性』 summable  之『歷史』而言,或又得再過了百年的時間之後,也許早已經是『柯西』之『極限觀』天下後『再議論』的了!!因是我們總該看看『歷史』上『歐拉』自己的『論證』的吧!!

220px-PI.svg
巴塞爾問題
\sum_{n=1}^{\infty}\frac{1}{n^2} = \frac{\pi^2}{6}

220px-Euler-10_Swiss_Franc_banknote_(front)

220px-Euler_GDR_stamp

Euler-USSR-1957-stamp

169px-Euler_Diagram.svg
邏輯之歐拉圖

假使說『三角函數』  \sin{x} 可以表示為 \sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \cdots,那麼『除以x 後,將會得到 \frac{\sin(x)}{x} = 1 - \frac{x^2}{3!} + \frac{x^4}{5!} - \frac{x^6}{7!} + \cdots,然而 \sin{x} 的『』是 x = n\cdot\pi,由於『除以x 之緣故,因此 n \neq 0,所以 n = \pm1, \pm2, \pm3, \dots,那麼 \frac{\sin(x)}{x} 應該會『等於\left(1 - \frac{x}{\pi}\right)\left(1 + \frac{x}{\pi}\right)\left(1 - \frac{x}{2\pi}\right)\left(1 + \frac{x}{2\pi}\right)\left(1 - \frac{x}{3\pi}\right)\left(1 + \frac{x}{3\pi}\right) \cdots,於是也就『等於\left(1 - \frac{x^2}{\pi^2}\right)\left(1 - \frac{x^2}{4\pi^2}\right)\left(1 - \frac{x^2}{9\pi^2}\right) \cdots,若是按造『牛頓恆等式』,考慮 x^2 項的『係數』,就會有 - \left(\frac{1}{\pi^2} + \frac{1}{4\pi^2} + \frac{1}{9\pi^2} + \cdots \right) = -\frac{1}{\pi^2}\sum_{n=1}^{\infty}\frac{1}{n^2},然而 \frac{\sin(x)}{x}  之『 x^2』的『係數』是『- \frac{1}{3!} = -\frac{1}{6}』,所以 -\frac{1}{6} = -\frac{1}{\pi^2}\sum \limits_{n=1}^{\infty}\frac{1}{n^2},於是 \sum \limits_{n=1}^{\infty}\frac{1}{n^2} = \frac{\pi^2}{6}。那麼『歐拉』是『』的嗎?還是他還是『』的呢??

── 摘自《【Sonic π】電聲學之電路學《四》之《 V!》‧下中

 

想必波利亞深知『形式幂級數』

Formal power series

In mathematics, a formal power series is a generalization of a polynomial, where the number of terms is allowed to be infinite; this implies giving up the possibility of replacing the variable in the polynomial with an arbitrary number. Thus a formal power series differs from a polynomial in that it may have infinitely many terms, and differs from a power series, whose variables can take on numerical values. One way to view a formal power series is as an infinite ordered sequence of numbers. In this case, the powers of the variable are used only to indicate the order of the coefficients, so that the coefficient of  x^{5} is the fifth term in the sequence. In combinatorics, formal power series provide representations of numerical sequences and of multisets, and for instance allow concise expressions for recursively defined sequences regardless of whether the recursion can be explicitly solved; this is known as the method of generating functions. More generally, formal power series can include series with any finite number of variables, and with coefficients in an arbitrary ring.

 

歷史上因其緣起引發的爭論︰

十二因緣

緣起經》玄奘譯

佛言,云何名緣起初義?謂:依此有故彼有,此生故彼生。所謂:無明名色名色六處六處老死,起愁、歎、苦、憂、惱,是名為純大苦蘊集,如是名為緣起初義。

邏輯學』上說『有□則有○,無○則無□』,既已『有□』又想『無○』,哪裡能夠不矛盾的啊!過去魏晉時『王弼』講︰一,數之始而物之極也。謂之為妙有者,欲言有,不見其形,則非有,故謂之;欲言其無,物由之以生,則非無,故謂之也。斯乃無中之有,謂之妙有。假使用『恆等式1 - x^n = (1 - x)(1 + x + \cdots + x^{n-1}) 來計算 \frac{1 + x + \cdots + x^{m-1}}{1 + x + \cdots + x^{n-1}},將等於 \frac{1 - x^m}{1 - x^n} = (1 - x^m) \left[1 + (x^n) + { (x^n) }^2 + { (x^n) } ^3 + \cdots \right] = 1 - x^m + x^n - x^{n+m} + x^{2n} - \cdots,那麼 1 - 1 + 1 - 1 + \cdots 難道不應該『等於\frac{m}{n} 的嗎?一七四三年時,『伯努利』正因此而反對『歐拉』所講的『可加性』說法,『』一個級數怎麼可能有『不同』的『』的呢??作者不知如果在太空裡,乘坐著『加速度』是 g 的太空船,在上面用著『樹莓派』控制的『奈米手』來擲『骰子』,是否一定能得到『相同點數』呢?難道說『牛頓力學』不是只要『初始態』是『相同』的話,那個『骰子』的『軌跡』必然就是『一樣』的嗎??據聞,法國義大利裔大數學家『約瑟夫‧拉格朗日』伯爵 Joseph Lagrange 倒是有個『說法』︰事實上,對於『不同』的 m,n 來講, 從『幂級數』來看,那個 = 1 - x^m + x^n - x^{n+m} + x^{2n} - \cdots 是有『零的間隙』的 1 + 0 + 0 + \cdots - 1 + 0 + 0 + \cdots,這就與 1 - 1 + 1 - 1 + \cdots形式』上『不同』,我們怎麼能『先驗』的『期望』結果會是『相同』的呢!!

─── 摘自《【Sonic π】電聲學之電路學《四》之《 V!》‧下

 

由於『生成函數』強而有力,貫串許多機率與統計之論述,且假以若干篇章說說此法也。

 

 

 

 

 

 

 

 

 

 

時間序列︰安斯庫姆四重奏

『眼見為憑』,明白統計數據間之局部與全局關係,此乃

安斯庫姆四重奏

安斯庫姆四重奏Anscombe’s quartet)是四組基本的統計特性一致的數據,但由它們繪製出的圖表則截然不同。每一組數據都包括了11個(x,y)點。這四組數據由統計學家弗朗西斯·安斯庫姆(Francis Anscombe)於1973年構造,他的目的是用來說明在分析數據前先繪製圖表的重要性,以及離群值對統計的影響之大。

安斯庫姆四重奏的四組數據圖表

這四組數據的共同統計特性如下:

性質 數值
x平均數 9
x方差 11
y的平均數 7.50(精確到小數點後兩位)
y的方差 4.122或4.127(精確到小數點後三位)
xy之間的相關係數 0.816(精確到小數點後三位)
線性回歸 \displaystyle y=3.00+0.500x(分別精確到小數點後兩位和三位)

在四幅圖中,由第一組數據繪製的圖表(左上圖)是看起來最「正常」的,可以看出兩個隨機變量之間的相關性。從第二組數據的圖表(右上圖)則可以明顯 地看出兩個隨機變量間的關係是非線性的。第三組中(左下圖),雖然存在著線性關係,但由於一個離群值的存在,改變了線性回歸線,也使得相關係數從1降至 0.81。最後,在第四個例子中(右下圖),儘管兩個隨機變量間沒有線性關係,但僅僅由於一個離群值的存在就使得相關係數變得很高。

愛德華·塔夫特(Edward Tufte)在他所著的《圖表設計的現代主義革命》(The Visual Display of Quantitative Information)一書的第一頁中,就使用安斯庫姆四重奏來說明繪製數據圖表的重要性。

四組數據的具體取值如下所示。其中前三組數據的x值都相同。

安斯庫姆四重奏
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

 

 

的主旋律乎?如是者將會知道概念間的關係及其先後次序之重要性吧!

Covariance

In probability theory and statistics, covariance is a measure of the joint variability of two random variables.[1] If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, i.e., the variables tend to show similar behavior, the covariance is positive.[2] For example, as a balloon is blown up it gets larger in all dimensions. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. If a sealed balloon is squashed in one dimension then it will expand in the other two. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which in addition to serving as a descriptor of the sample, also serves as an estimated value of the population parameter.

Definition

The covariance between two jointly distributed real-valued random variables X and Y with finite second moments is defined as[3]

\operatorname {cov} (X,Y)=\operatorname {E} {{\big [}(X-\operatorname {E} [X])(Y-\operatorname {E} [Y]){\big ]}},

where E[X] is the expected value of X, also known as the mean of X. The covariance is also sometimes denoted “σ”, in analogy to variance. By using the linearity property of expectations, this can be simplified to

{\displaystyle {\begin{aligned}\operatorname {cov} (X,Y)&=\operatorname {E} \left[\left(X-\operatorname {E} \left[X\right]\right)\left(Y-\operatorname {E} \left[Y\right]\right)\right]\\&=\operatorname {E} \left[XY-X\operatorname {E} \left[Y\right]-\operatorname {E} \left[X\right]Y+\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\right]\\&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]+\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right]\\&=\operatorname {E} \left[XY\right]-\operatorname {E} \left[X\right]\operatorname {E} \left[Y\right].\end{aligned}}}

However, when \operatorname {E} [XY]\approx \operatorname {E} [X]\operatorname {E} [Y], this last equation is prone to catastrophic cancellation when computed with floating point arithmetic and thus should be avoided in computer programs when the data has not been centered before.[4] Numerically stable algorithms should be preferred in this case.

 

Joint probability distribution

In the study of probability, given at least two random variables X, Y, …, that are defined on a probability space, the joint probability distribution for X, Y, … is a probability distribution that gives the probability that each of X, Y, … falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function (in the case of discrete variables). These in turn can be used to find two other types of distributions: the marginal distribution giving the probabilities for any one of the variables with no reference to any specific ranges of values for the other variables, and the conditional probability distribution giving the probabilities for any subset of the variables conditional on particular values of the remaining variables.

Many sample observations (black) are shown from a joint probability distribution. The marginal densities are shown as well.

Examples

Coin Flips

Consider the flip of two fair coins; let A and B be discrete random variables associated with the outcomes first and second coin flips respectively. If a coin displays “heads” then associated random variable is 1, and is 0 otherwise. The joint probability density function of A and B defines probabilities for each pair of outcomes. All possible outcomes are

(A=0,B=0),(A=0,B=1),(A=1,B=0),(A=1,B=1)

Since each outcome is equally likely the joint probability density function becomes

P(A,B)=1/4

when A,B\in \{0,1\}. Since the coin flips are independent, the joint probability density function is the product of the marginals:

P(A,B)=P(A)P(B).

In general, each coin flip is a Bernoulli trial and the sequence of flips follows a Bernoulli distribution.

Dice Rolls

Consider the roll of a fair die and let A = 1 if the number is even (i.e. 2, 4, or 6) and A = 0 otherwise. Furthermore, let B = 1 if the number is prime (i.e. 2, 3, or 5) and B = 0 otherwise.

  1 2 3 4 5 6
A 0 1 0 1 0 1
B 0 1 1 0 1 0

Then, the joint distribution of A and B, expressed as a probability mass function, is

\mathrm {P} (A=0,B=0)=P\{1\}={\frac {1}{6}},\;\mathrm {P} (A=1,B=0)=P\{4,6\}={\frac {2}{6}},

\mathrm {P} (A=0,B=1)=P\{3,5\}={\frac {2}{6}},\;\mathrm {P} (A=1,B=1)=P\{2\}={\frac {1}{6}}.

These probabilities necessarily sum to 1, since the probability of some combination of A and B occurring is 1.

 

因此能掌握『統計相關』說何事?

Correlation and dependence

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or two sets of data. Correlation is any of a broad class of statistical relationships involving dependence, though in common usage it most often refers to the extent to which two variables have a linear relationship with each other. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price.

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling; however, correlation is not sufficient to demonstrate the presence of such a causal relationship (i.e., correlation does not imply causation).

Formally, dependence refers to any situation in which random variables do not satisfy a mathematical condition of probabilistic independence. In loose usage, correlation can refer to any departure of two or more random variables from independence, but technically it refers to any of several more specialized types of relationship between mean values. There are several correlation coefficients, often denoted ρ or r, measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may exist even if one is a nonlinear function of the other). Other correlation coefficients have been developed to be more robust than the Pearson correlation – that is, more sensitive to nonlinear relationships.[1][2][3] Mutual information can also be applied to measure dependence between two variables.

Pearson’s product-moment coefficient

The most familiar measure of dependence between two quantities is the Pearson product-moment correlation coefficient, or “Pearson’s correlation coefficient”, commonly called simply “the correlation coefficient”. It is obtained by dividing the covariance of the two variables by the product of their standard deviations. Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.[4]

The population correlation coefficient ρX,Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is defined as:

\rho _{X,Y}=\mathrm {corr} (X,Y)={\mathrm {cov} (X,Y) \over \sigma _{X}\sigma _{Y}}={E[(X-\mu _{X})(Y-\mu _{Y})] \over \sigma _{X}\sigma _{Y}},

where E is the expected value operator, cov means covariance, and corr is a widely used alternative notation for the correlation coefficient.

The Pearson correlation is defined only if both of the standard deviations are finite and nonzero. It is a corollary of the Cauchy–Schwarz inequality that the correlation cannot exceed 1 in absolute value. The correlation coefficient is symmetric: corr(X,Y) = corr(Y,X).

The Pearson correlation is +1 in the case of a perfect direct (increasing) linear relationship (correlation), −1 in the case of a perfect decreasing (inverse) linear relationship (anticorrelation),[5] and some value in the open interval (−1, 1) in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables.

If the variables are independent, Pearson’s correlation coefficient is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables. For example, suppose the random variable X is symmetrically distributed about zero, and Y = X2. Then Y is completely determined by X, so that X and Y are perfectly dependent, but their correlation is zero; they are uncorrelated. However, in the special case when X and Y are jointly normal, uncorrelatedness is equivalent to independence.

If we have a series of n measurements of X and Y written as xi and yi for i = 1, 2, …, n, then the sample correlation coefficient can be used to estimate the population Pearson correlation r between X and Y. The sample correlation coefficient is written:

{\displaystyle r_{xy}={\frac {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{(n-1)s_{x}s_{y}}}={\frac {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sqrt {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}\sum \limits _{i=1}^{n}(y_{i}-{\bar {y}})^{2}}}},}

where x and y are the sample means of X and Y, and sx and sy are the sample standard deviations of X and Y.

This can also be written as:

{\displaystyle {\begin{aligned}r_{xy}&={\frac {\sum x_{i}y_{i}-n{\bar {x}}{\bar {y}}}{ns_{x}s_{y}}}\\&={\frac {n\sum x_{i}y_{i}-\sum x_{i}\sum y_{i}}{{\sqrt {n\sum x_{i}^{2}-(\sum x_{i})^{2}}}~{\sqrt {n\sum y_{i}^{2}-(\sum y_{i})^{2}}}}}.\end{aligned}}}

If x and y are results of measurements that contain measurement error, the realistic limits on the correlation coefficient are not −1 to +1 but a smaller range.[6]

For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r, Pearson’s product-moment coefficient.

 

通達時間序列『自相關』之計算耶??

Autocorrelation

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Unit root processes, trend stationary processes, autoregressive processes, and moving average processes are specific forms of processes with autocorrelation.

Definitions

Different fields of study define autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with autocovariance.

Statistics

In statistics, the autocorrelation of a random process is the correlation between values of the process at different times, as a function of the two times or of the time lag. Let X be a stochastic process, and t be any point in time. (t may be an integer for a discrete-time process or a real number for a continuous-time process.) Then Xt is the value (or realization) produced by a given run of the process at time t. Suppose that the process has mean μt and variance σt2 at time t, for each t. Then the definition of the autocorrelation between times s and t is

R(s,t)={\frac {\operatorname {E} [(X_{t}-\mu _{t})(X_{s}-\mu _{s})]}{\sigma _{t}\sigma _{s}}}\,,

where “E” is the expected value operator. Note that this expression is not well-defined for all-time series or processes, because the mean may not exist, or the variance may be zero (for a constant process) or infinite (for processes with distribution lacking well-behaved moments, such as certain types of power law). If the function R is well-defined, its value must lie in the range [−1, 1], with 1 indicating perfect correlation and −1 indicating perfect anti-correlation.

If Xt is a wide-sense stationary process then the mean μ and the variance σ2 are time-independent, and further the autocorrelation depends only on the lag between t and s: the correlation depends only on the time-distance between the pair of values but not on their position in time. This further implies that the autocorrelation can be expressed as a function of the time-lag, and that this would be an even function of the lag τ = s − t. This gives the more familiar form

  R(\tau )={\frac {\operatorname {E} [(X_{t}-\mu )(X_{t+\tau }-\mu )]}{\sigma ^{2}}},\,

and the fact that this is an even function can be stated as

R(\tau )=R(-\tau ).\,

It is common practice in some disciplines, other than statistics and time series analysis, to drop the normalization by σ2 and use the term “autocorrelation” interchangeably with “autocovariance”. However, the normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations.

 

庶幾免於相關、因果間誤謬也!!

Correlation does not imply causation

Correlation does not imply causation” is a phrase used in statistics to emphasize that a correlation between two variables does not imply that one causes the other.[1][2] Many statistical tests calculate correlation between variables. A few go further, using correlation as a basis for testing a hypothesis of a true causal relationship; examples are the Granger causality test and convergent cross mapping.[clarification needed]

The counter-assumption, that “correlation proves causation,” is considered a questionable cause logical fallacy in that two events occurring together are taken to have a cause-and-effect relationship. This fallacy is also known as cum hoc ergo propter hoc, Latin for “with this, therefore because of this,” and “false cause.” A similar fallacy, that an event that follows another was necessarily a consequence of the first event, is sometimes described as post hoc ergo propter hoc (Latin for “after this, therefore because of this.”).

For example, in a widely studied case, numerous epidemiological studies showed that women taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But randomized controlled trials showed that HRT caused a small but statistically significant increase in risk of CHD. Re-analysis of the data from the epidemiological studies showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better-than-average diet and exercise regimens. The use of HRT and decreased incidence of coronary heart disease were coincident effects of a common cause (i.e. the benefits associated with a higher socioeconomic status), rather than a direct cause and effect, as had been supposed.[3]

As with any logical fallacy, identifying that the reasoning behind an argument is flawed does not imply that the resulting conclusion is false. In the instance above, if the trials had found that hormone replacement therapy does in fact have a negative incidence on the likelihood of coronary heart disease the assumption of causality would have been correct, although the logic behind the assumption would still have been flawed.

 

 

 

 

 

 

 

 

 

時間序列︰當波動遇上耗散

巨觀世界之穩定性,熱力學的有效性,促使玻爾茲曼運用物理運動定律 、統計方法,思考巨量微觀粒子之構成系統,企圖從中推導出整體熱力學。這個歷史進程中,有件困擾玻爾茲曼的事在於對稱性的破壞︰怎麼可能每個粒子都有時間對稱性,那個構成系統的整體卻沒有的呢?

Teeter-totter
展示時間對稱性蹺蹺板
它最後將導向何方??

物理系統的『時間對稱性』T-symmetry,是說『物理定律』在『時間反向變換』time reversal transformation T: t \mapsto -t 下保持不變。比方說『牛頓第二運動定律\vec{F} = m \frac{d}{d[-t]} \frac{d}{d[-t]}  \vec{r} = m \frac{d}{dt} \frac{d}{dt}  \vec{r} 具有『時間對稱性』。假使一個粒子從『初始態(\vec{r_i} , \vec{p_i}) 沿著軌跡往『終止態(\vec{r_f} , \vec{p_f}) 運動,如果『時間逆流』,此粒子將逆向由『終止態(\vec{r_f} , \vec{p_f}) 沿著軌跡向『初始態(\vec{r_i} , \vec{p_i}) 運動。

於是在『理化系統』中,就有了『微觀可逆性』原理︰

Corresponding to every individual process there is a reverse process, and in a state of equilibrium the average rate of every process is equal to the average rate of its reverse process.

。然而這個『微觀可逆性』原理,到了『巨觀世界』後,卻是與『熱力學』的『最大熵』 Maximum entropy 理論衝突。一八七二年時,玻爾茲曼提出了『 H 理論』︰

H(t) = \int \limits_0^{\infty} f(E,t) \left[ \log\left(\frac{f(E,t)}{\sqrt{E}}\right) - 1 \right] \, dE ,此處 f(E,t) 就是在 t 時間的『能量分布』函數,而那個 f(E,t) dE 是『動能』在 EE+dE 間之『粒子數』。據聞,玻爾茲曼是想用著『統計力學』的辦法,能夠推導出『最大熵S 的『不可逆性』。

Translational_motion

310px-Maxwell's_demon.svg

馬克士威妖

可以如是描述成︰假使一個絕熱容器被分成兩塊,中間有『』所控制之『』,那個容器中的『粒子』到處亂撞時,總會碰到『』上,此『』喜歡將『快‧慢』之『粒子』分別為『兩半』,因此,其中的一半就會比另外一半的『溫度』要高。

由於更早五年前,『馬克士威』設想了一個『想像實驗』︰

… if we conceive of a being whose faculties are so sharpened that he can follow every molecule in its course, such a being, whose attributes are as essentially finite as our own, would be able to do what is impossible to us. For we have seen that molecules in a vessel full of air at uniform temperature are moving with velocities by no means uniform, though the mean velocity of any great number of them, arbitrarily selected, is almost exactly uniform. Now let us suppose that such a vessel is divided into two portions, A and B, by a division in which there is a small hole, and that a being, who can see the individual molecules, opens and closes this hole, so as to allow only the swifter molecules to pass from A to B, and only the slower molecules to pass from B to A. He will thus, without expenditure of work, raise the temperature of B and lower that of A, in contradiction to the second law of thermodynamics.

也就有人『Johann Loschmidt』反對玻爾茲曼的『 H 理論』之說法︰if there is a motion of a system from time t0 to time t1 to time t2 that leads to a steady decrease of H (increase of entropy) with time, then there is another allowed state of motion of the system at t1, found by reversing all the velocities, in which H must increase. This revealed that one of Boltzmann’s key assumptions, molecular chaos, or, the Stosszahlansatz, that all particle velocities were completely uncorrelated, did not follow from Newtonian dynamics.

─── 摘自《物理哲學·下

 

難到『詳細平衡』之基本原理有錯誤??

Detailed balance

The principle of detailed balance is formulated for kinetic systems which are decomposed into elementary processes (collisions, or steps, or elementary reactions): At equilibrium, each elementary process should be equilibrated by its reverse process.

History

The principle of detailed balance was explicitly introduced for collisions by Ludwig Boltzmann. In 1872, he proved his H-theorem using this principle.[1] The arguments in favor of this property are founded upon microscopic reversibility.[2]

Five years before Boltzmann, James Clerk Maxwell used the principle of detailed balance for gas kinetics with the reference to the principle of sufficient reason.[3] He compared the idea of detailed balance with other types of balancing (like cyclic balance) and found that “Now it is impossible to assign a reason” why detailed balance should be rejected (pg. 64).

Albert Einstein in 1916 used the principle of detailed balance in a background for his quantum theory of emission and absorption of radiation.[4]

In 1901, Rudolf Wegscheider introduced the principle of detailed balance for chemical kinetics.[5] In particular, he demonstrated that the irreversible cycles {\displaystyle {\ce {A1->A2->\cdots ->A_{\mathit {n}}->A1}}} are impossible and found explicitly the relations between kinetic constants that follow from the principle of detailed balance. In 1931, Lars Onsager used these relations in his works,[6] for which he was awarded the 1968 Nobel Prize in Chemistry.

The principle of detailed balance has been used in Markov chain Monte Carlo methods since their invention in 1953.[7][8] In particular, in the Metropolis–Hastings algorithm and in its important particular case, Gibbs sampling, it is used as a simple and reliable condition to provide the desirable equilibrium state.

Now, the principle of detailed balance is a standard part of the university courses in statistical mechanics, physical chemistry, chemical and physical kinetics.[9][10][11]

Microscopical background

The microscopic “reversing of time” turns at the kinetic level into the “reversing of arrows”: the elementary processes transform into their reverse processes. For example, the reaction

{\displaystyle \sum _{i}\alpha _{i}{\ce {A}}_{i}{\ce {->}}\sum _{j}\beta _{j}{\ce {B}}_{j}} transforms into ∑ j β j B {\displaystyle \sum _{j}\beta _{j}{\ce {B}}_{j}{\ce {->}}\sum _{i}\alpha _{i}{\ce {A}}_{i}}

and conversely. (Here,  {\displaystyle {\ce {A}}_{i},{\ce {B}}_{j}} are symbols of components or states,  \alpha _{i},\beta _{j}\geq 0 are coefficients). The equilibrium ensemble should be invariant with respect to this transformation because of microreversibility and the uniqueness of thermodynamic equilibrium. This leads us immediately to the concept of detailed balance: each process is equilibrated by its reverse process.

This reasoning is based on three assumptions:

  1.   {\displaystyle {\ce {A}}_{i}} does not change under time reversal;
  2. Equilibrium is invariant under time reversal;
  3. The macroscopic elementary processes are microscopically distinguishable. That is, they represent disjoint sets of microscopic events.

Any of these assumptions may be violated.[12] For example, Boltzmann’s collision can be represented as {\displaystyle {\ce {{A_{\mathit {v}}}+A_{\mathit {w}}->{A_{\mathit {v'}}}+A_{\mathit {w'}},}}} where  {\displaystyle {\ce {A}}_{v}} is a particle with velocity v. Under time reversal  {\displaystyle {\ce {A}}_{v}} transforms into  {\displaystyle {\ce {A}}_{-v}}. Therefore, the collision is transformed into the reverse collision by the PT transformation, where P is the space inversion and T is the time reversal. Detailed balance for Boltzmann’s equation requires PT-invariance of collisions’ dynamics, not just T-invariance. Indeed, after the time reversal the collision {\displaystyle {\ce {{A_{\mathit {v}}}+A_{\mathit {w}}->{A_{\mathit {v'}}}+A_{\mathit {w'}},}}} transforms into {\displaystyle {\ce {{A_{\mathit {-v'}}}+A_{\mathit {-w'}}->{A_{\mathit {-v}}}+A_{\mathit {-w}}.}}} For the detailed balance we need transformation into {\displaystyle {\ce {{A_{\mathit {v'}}}+A_{\mathit {w'}}->{A_{\mathit {v}}}+A_{\mathit {w}}.}}} For this purpose, we need to apply additionally the space reversal P. Therefore, for the detailed balance in Boltzmann’s equation not T-invariance but PT-invariance is needed.

Equilibrium may be not T– or PT-invariant even if the laws of motion are invariant. This non-invariance may be caused by the spontaneous symmetry breaking. There exist nonreciprocal media (for example, some bi-isotropic materials) without T and PT invariance.[12]

If different macroscopic processes are sampled from the same elementary microscopic events then macroscopic detailed balance[clarification needed] may be violated even when microscopic detailed balance holds.[12][13]

Now, after almost 150 years of development, the scope of validity and the violations of detailed balance in kinetics seem to be clear.

……

Detailed balance and entropy increase

For many systems of physical and chemical kinetics, detailed balance provides sufficient conditions for the strict increase of entropy in isolated systems. For example, the famous Boltzmann H-theorem[1] states that, according to the Boltzmann equation, the principle of detailed balance implies positivity of entropy production. The Boltzmann formula (1872) for entropy production in rarefied gas kinetics with detailed balance[1][2] served as a prototype of many similar formulas for dissipation in mass action kinetics[15] and generalized mass action kinetics[16] with detailed balance.

Nevertheless, the principle of detailed balance is not necessary for entropy growth. For example, in the linear irreversible cycle {\displaystyle {\ce {A1->A2->A3->A1}}}, entropy production is positive but the principle of detailed balance does not hold.

Thus, the principle of detailed balance is a sufficient but not necessary condition for entropy increase in Boltzmann kinetics. These relations between the principle of detailed balance and the second law of thermodynamics were clarified in 1887 when Hendrik Lorentz objected to the Boltzmann H-theorem for polyatomic gases.[17] Lorentz stated that the principle of detailed balance is not applicable to collisions of polyatomic molecules.

Boltzmann immediately invented a new, more general condition sufficient for entropy growth.[18] Boltzmann’s condition holds for all Markov processes, irrespective of time-reversibility. Later, entropy increase was proved for all Markov processes by a direct method.[19][20] These theorems may be considered as simplifications of the Boltzmann result. Later, this condition was referred to as the “cyclic balance” condition (because it holds for irreversible cycles) or the “semi-detailed balance” or the “complex balance”. In 1981, Carlo Cercignani and Maria Lampis proved that the Lorentz arguments were wrong and the principle of detailed balance is valid for polyatomic molecules.[21] Nevertheless, the extended semi-detailed balance conditions invented by Boltzmann in this discussion remain the remarkable generalization of the detailed balance.

───

 

也許該說的是詳細平衡原理有假設前提。自然界裡事實存在著反例 !就像也有

自發對稱破缺

自發對稱破缺(spontaneous symmetry breaking)是某些物理系統實現對稱性破缺的模式。當物理系統所遵守的自然定律具有某種對稱性,而物理系統本身並不具有這種對稱性,則稱此現象為自發對稱破缺。[1]:141[2]:125這是一種自發性過程(spontaneous process),由於這過程,本來具有這種對稱性的物理系統,最終變得不再具有這種對稱性,或不再表現出這種對稱性,因此這種對稱性被隱藏。因為自發對稱破缺,有些物理系統的運動方程式拉格朗日量遵守這種對稱性,但是最低能量解答不具有這種對稱性。從描述物理現象的拉格朗日量或運動方程式,可以對於這現象做分析研究。

對稱性破缺主要分為自發對稱破缺與明顯對稱性破缺兩種。假若在物理系統的拉格朗日量裏存在著一個或多個違反某種對稱性的項目,因此導致系統的物理行為不具備這種對稱性,則稱此為明顯對稱性破缺

如右圖所示,假設在墨西哥帽(sombrero)的帽頂有一個圓球。這個圓球是處於旋轉對稱性狀態,對於繞著帽子中心軸的旋轉,圓球的位置不變。這圓球也處於局部最大重力勢的 狀態,極不穩定,稍加微擾,就可以促使圓球滾落至帽子谷底的任意位置,因此降低至最小重力勢位置,使得旋轉對稱性被打破。儘管這圓球在帽子谷底的所有可能 位置因旋轉對稱性而相互關聯,圓球實際實現的帽子谷底位置不具有旋轉對稱性──對於繞著帽子中心軸的旋轉,圓球的位置會改變。[3]:203

大多數物質的簡單相態相變,例如晶體磁鐵、一般超導體等等,可以從自發對稱破缺的觀點來了解。像分數量子霍爾效應(fractional quantum Hall effect)一類的拓撲相(topological phase)物質是值得注意的例外。[4]

墨西哥帽位能函數的電腦繪圖,對於繞著帽子中心軸的旋轉,帽頂具有旋轉對稱性,帽子谷底的任意位置不具有旋轉對稱性,在帽子谷底的任意位置會出現對稱性破缺。

 

的事一樣!!如是能否理解波動‧耗散定理耶☆

Fluctuation-dissipation theorem

The fluctuation-dissipation theorem (FDT) is a powerful tool in statistical physics for predicting the behavior of systems that obey detailed balance. Given that a system obeys detailed balance, the theorem is a general proof that thermal fluctuations in a physical variable predict the response quantified by the admittance or impedance of the same physical variable, and vice versa. The fluctuation-dissipation theorem applies both to classical and quantum mechanical systems.

The fluctuation-dissipation theorem relies on the assumption that the response of a system in thermodynamic equilibrium to a small applied force is the same as its response to a spontaneous fluctuation. Therefore, the theorem connects the linear response relaxation of a system from a prepared non-equilibrium state to its statistical fluctuation properties in equilibrium.[1] Often the linear response takes the form of one or more exponential decays.

The fluctuation-dissipation theorem was originally formulated by Harry Nyquist in 1928,[2] and later proven by Herbert Callen and Theodore A. Welton in 1951.[3]

Qualitative overview and examples

The fluctuation-dissipation theorem says that when there is a process that dissipates energy, turning it into heat (e.g., friction), there is a reverse process related to thermal fluctuations. This is best understood by considering some examples:

If an object is moving through a fluid, it experiences drag (air resistance or fluid resistance). Drag dissipates kinetic energy, turning it into heat. The corresponding fluctuation is Brownian motion. An object in a fluid does not sit still, but rather moves around with a small and rapidly-changing velocity, as molecules in the fluid bump into it. Brownian motion converts heat energy into kinetic energy—the reverse of drag.
If electric current is running through a wire loop with a resistor in it, the current will rapidly go to zero because of the resistance. Resistance dissipates electrical energy, turning it into heat (Joule heating). The corresponding fluctuation is Johnson noise. A wire loop with a resistor in it does not actually have zero current, it has a small and rapidly-fluctuating current caused by the thermal fluctuations of the electrons and atoms in the resistor. Johnson noise converts heat energy into electrical energy—the reverse of resistance.
When light impinges on an object, some fraction of the light is absorbed, making the object hotter. In this way, light absorption turns light energy into heat. The corresponding fluctuation is thermal radiation (e.g., the glow of a “red hot” object). Thermal radiation turns heat energy into light energy—the reverse of light absorption. Indeed, Kirchhoff’s law of thermal radiation confirms that the more effectively an object absorbs light, the more thermal radiation it emits.

Examples in detail

The fluctuation-dissipation theorem is a general result of statistical thermodynamics that quantifies the relation between the fluctuations in a system at thermal equilibrium and the response of the system to applied perturbations.

The model thus allows, for example, the use of molecular models to predict material properties in the context of linear response theory. The theorem assumes that applied perturbations, e.g., mechanical forces or electric fields, are weak enough that rates of relaxation remain unchanged.

Brownian motion

For example, Albert Einstein noted in his 1905 paper on Brownian motion that the same random forces that cause the erratic motion of a particle in Brownian motion would also cause drag if the particle were pulled through the fluid. In other words, the fluctuation of the particle at rest has the same origin as the dissipative frictional force one must do work against, if one tries to perturb the system in a particular direction.

From this observation Einstein was able to use statistical mechanics to derive the Einstein-Smoluchowski relation

   D = {\mu \, k_B T}

which connects the diffusion constant D and the particle mobility μ, the ratio of the particle’s terminal drift velocity to an applied force. kB is the Boltzmann constant, and T is the absolute temperature.

Thermal noise in a resistor

In 1928, John B. Johnson discovered and Harry Nyquist explained Johnson–Nyquist noise. With no applied current, the mean-square voltage depends on the resistance R k_BT, and the bandwidth  \Delta\nu over which the voltage is measured:

   \langle V^2 \rangle = 4Rk_BT\,\Delta\nu.

 

 

 

 

 

 

 

 

 

時間序列︰阿涅西的女巫

如何深入了解一個重要的定律︰

大數定律

數學統計學中,大數定律又稱大數法則、大數律,是描述相當多次數重複實驗的結果的定律。根據這個定律知道,樣本數量越多 ,則其平均就越趨近期望值

大數定律很重要,因為它「保證」了一些隨機事件的均值的長期穩定性。人們發現,在重複試驗中,隨著試驗次數的增加,事件發生的頻率趨於一個穩定值;人們同時也發現,在對物理量的測量實踐中,測定值的算術平均也具有穩定性。比如,我們向上拋一枚硬幣 ,硬幣落下後哪一面朝上本來是偶然的,但當我們上拋硬幣的次數足夠多後,達到上萬次甚至幾十萬幾百萬次以後,我們就會發現,硬幣每一面向上的次數約占總次數的二分之一。偶然之中包含著必然。

切比雪夫定理的一個特殊情況、辛欽定理伯努利大數定律都概括了這一現象,都稱為大數定律。

表現形式

大數定律主要有兩種表現形式:弱大數定律強大數定律。定律的兩種形式都肯定無疑地表明,樣本均值

{\overline {X}}_{n}={\frac {1}{n}}(X_{1}+\cdots +X_{n})

收斂於真值

{\overline {X}}_{n}\,\to \,\mu \qquad {\textrm {for}}\qquad n\to \infty ,

其中 X1, X2, … 是獨立同分布的,期望值 E(X1) = E(X2) = …= µ 的,勒貝格可積的隨機變量構成的無窮序列。Xj 的勒貝格可積性意味著期望值 E(Xj) 存在且有限。

方差 Var(X1) = Var(X2) = … = σ2 < ∞ 有限的假設是非必要的。很大或者無窮大的方差會使其收斂得緩慢一些,但大數定律仍然成立。通常採用這個假設來使證明更加簡潔。

強和弱之間的差別在所斷言的收斂的方式。對於這些方式的解釋,參見隨機變量的收斂

 

有時候不只要知道它的推導過程︰

Proof of the weak law

Given X1, X2, … an infinite sequence of i.i.d. random variables with finite expected value E(X1) = E(X2) = … = µ < ∞, we are interested in the convergence of the sample average

{\overline {X}}_{n}={\tfrac {1}{n}}(X_{1}+\cdots +X_{n}).

The weak law of large numbers states:

Theorem: {\displaystyle {\begin{matrix}{}\\{\overline {X}}_{n}\ {\xrightarrow {P}}\ \mu \qquad {\textrm {when}}\ n\to \infty .\\{}\end{matrix}}}
     
 
(law. 2)

Proof using Chebyshev’s inequality

This proof uses the assumption of finite variance  \operatorname {Var} (X_{i})=\sigma ^{2} (for all  i). The independence of the random variables implies no correlation between them, and we have that

\operatorname {Var} ({\overline {X}}_{n})=\operatorname {Var} ({\tfrac {1}{n}}(X_{1}+\cdots +X_{n}))={\frac {1}{n^{2}}}\operatorname {Var} (X_{1}+\cdots +X_{n})={\frac {n\sigma ^{2}}{n^{2}}}={\frac {\sigma ^{2}}{n}}.

The common mean μ of the sequence is the mean of the sample average:

E({\overline {X}}_{n})=\mu .

Using Chebyshev’s inequality on  {\overline {X}}_{n} results in

\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\leq {\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.

This may be used to obtain the following:

\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|<\varepsilon )=1-\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\geq 1-{\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.

As n approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

{\displaystyle {\begin{matrix}{}\\{\overline {X}}_{n}\ {\xrightarrow {P}}\ \mu \qquad {\textrm {when}}\ n\to \infty .\\{}\end{matrix}}}
 

 

還要能知道有沒有反例︰

柯西分布

柯西分布也叫作柯西-勞侖茲分布,它是以奧古斯丁·路易·柯西亨德里克·勞侖茲名字命名的連續機率分布,其機率密度函數

 f(x; x_0,\gamma) = \frac{1}{\pi\gamma \left[1 + \left(\frac{x-x_0}{\gamma}\right)^2\right]} \!
= { 1 \over \pi } \left[ { \gamma \over (x - x_0)^2 + \gamma^2 } \right] \!

其中x0是定義分布峰值位置的位置參數γ是最大值一半處的一半寬度的尺度參數

作為機率分布,通常叫作柯西分布物理學家也將之稱為勞侖茲分布或者Breit-Wigner分布。在物理學中的重要性很大一部分歸因於它是描述受迫共振微分方程式的解。在光譜學中,它描述了被共振或者其它機制加寬的譜線形狀。在下面的部分將使用柯西分布這個統計學術語。

x0 = 0且γ = 1的特例稱為標準柯西分布,其機率密度函數為

   f(x; 0,1) = \frac{1}{\pi (1 + x^2)}. \!

特性

其累積分布函數為:

F(x; x_0,\gamma)=\frac{1}{\pi} \arctan\left(\frac{x-x_0}{\gamma}\right)+\frac{1}{2}

柯西分布的逆累積分布函數為

F^{-1}(p; x_0,\gamma) = x_0 + \gamma\,\tan(\pi\,(p-1/2)). \!

柯西分布的平均值變異數或者都沒有定義,它的眾數中值有定義都等於 x0

X 表示柯西分布隨機變量,柯西分布的特性函數表示為:

\phi_x(t; x_0,\gamma) = \mathrm{E}(e^{i\,X\,t}) = \exp(i\,x_0\,t-\gamma\,|t|). \!

如果 UV期望值為 0、變異數為 1 的兩個獨立常態分布隨機變量的話,那麼比值 U/V 為柯西分布。

標準柯西分布是學生t-分布自由度為1的特殊情況。

柯西分布是穩定分布:如果  X\sim\textrm{Stable}(1,0,\gamma,\mu),則 X\sim\textrm{Cauchy}(\mu,\gamma)

如果 X1, …, Xn 是分別符合柯西分布的相互獨立同分布隨機變量,那麼算術平均數X1 + … + Xn)/n 有同樣的柯西分布。為了證明這一點,我們來計算採樣平均的特性函數

\phi_{\overline{X}}(t) = \mathrm{E}\left(e^{i\,\overline{X}\,t}\right) \,\!

其中,  \overline{X} 是採樣平均值。這個例子表明不能捨棄中央極限定理中的有限變量假設。

洛侖茲線性分布更適合於那種比較扁、寬的曲線 高斯線性分布則適合較高、較窄的曲線 當然,如果是比較居中的情況,兩者都可以。 很多情況下,採用的是兩者各占一定比例的做法。如勞侖次占60%,高斯占40%.

 

Lorentzian function Imaginary part Maple complex 3D plot

 

Imaginary plot of Lorentzian function (Maple animation)

 

清楚明白違背的理由︰

Explanation of undefined moments

Mean

If a probability distribution has a density function f(x), then the mean is

\int_{-\infty}^\infty x f(x)\,dx. \qquad\qquad (1)\!

The question is now whether this is the same thing as

\int_a^\infty x f(x)\,dx+\int_{-\infty}^a x f(x)\,dx.\qquad\qquad (2) \!

for an arbitrary real number a.

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the Cauchy distribution, both the positive and negative terms of (2) are infinite. Hence (1) is undefined.[12]

Note that the Cauchy principal value of the Cauchy distribution is:

  \lim_{a\to\infty}\int_{-a}^a x f(x)\,dx, \!

which is zero, while:

  \lim_{a\to\infty}\int_{-2a}^a x f(x)\,dx, \!

is not zero, as can be seen easily by computing the integral.

Various results in probability theory about expected values, such as the strong law of large numbers, will not work in such cases.[12]

Higher moments

The Cauchy distribution does not have finite moments of any order. Some of the higher raw moments do exist and have a value of infinity, for example the raw second moment:

{\begin{aligned}{\mathrm {E}}[X^{2}]&\propto \int _{{-\infty }}^{\infty }{\frac {x^{2}}{1+x^{2}}}\,dx=\int _{{-\infty }}^{\infty }1-{\frac {1}{1+x^{2}}}\,dx\\[8pt]&=\int _{{-\infty }}^{\infty }dx-\int _{{-\infty }}^{\infty }{\frac {1}{1+x^{2}}}\,dx=\int _{{-\infty }}^{\infty }dx-\pi =\infty .\end{aligned}}

By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1). Higher even-powered raw moments will also evaluate to infinity. Odd-powered raw moments, however, are undefined, which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to \infty -\infty since the two halves of the integral both diverge and have opposite signs. The first raw moment is the mean, which, being odd, does not exist. (See also the discussion above about this.) This in turn means that all of the central moments and standardized moments are undefined, since they are all based on the mean. The variance—which is the second central moment—is likewise non-existent (despite the fact that the raw second moment exists with the value infinity).

The results for higher moments follow from Hölder’s inequality, which implies that higher moments (or halves of moments) diverge if lower ones do.

 

且試著追溯它的歷史︰

箕舌線

箕舌線是平面曲線的一種,也被稱為阿涅西的女巫英語:The Witch of Agnesi[1][2][3]

給定一個圓和圓上的一點O。對於圓上的任何其它點A,作割線OA。設M是O的對稱點。OA與M的切線相交於N。過N且與OM平行的直線,與過A且與OM垂直的直線相交於P。則P的軌跡就是箕舌線。

箕舌線有一條漸近線,它是上述給定圓過O點的切線。

方程

設O是原點,M在正的y軸上。假設圓的半徑是a

則曲線的方程為  {\displaystyle y={\frac {8a^{3}}{x^{2}+4a^{2}}}}

注意如果a=1/2,則曲線化為最簡單的形式: {\displaystyle y={\frac {1}{x^{2}+1}}.}

如果  \theta \,是OM與OA的夾角,則曲線的參數方程為:

{\displaystyle x=2a\tan \theta ,\ y=2a\cos ^{2}\theta .\,}

如果  \theta \,是OA與x軸的夾角,則曲線的參數方程為:

{\displaystyle x=2a\cot \theta ,\ y=2a\sin ^{2}\theta .\,}

箕舌線

歷史

皮埃爾·德·費馬曾在1630年研究這條曲線。1703年時格蘭迪提出了建構這條曲線的方法。1718年時格蘭迪建議將這條曲線命名為versoria,意思是張帆的繩子,並將這條曲線的義大利文名稱命名為versiera[4]

1748年時瑪利亞·阿涅西出版了著名的著作《Instituzioni analitiche ad uso della gioventù italiana》,其中箕舌線仍沿用格蘭迪取的名稱versiera[4],一恰好當時的義大利文Aversiera/Versiera是衍生自拉丁文的Adversarius,是魔鬼的一個稱呼「與神為敵的」,和女巫是同義詞[5]。也許因為這個原因,劍橋教授 約翰·科爾森就誤譯了這條曲線。許多近代有關阿涅西及此曲線的著作對於誤譯的原因有些不同的猜測[6][7][8]斯特洛伊克認為:

versiera這個字是衍生自拉丁文的vertere,但後者也是義大利文avversiera(女魔鬼)的縮寫。英格蘭有些聰敏者將之翻譯成女巫(英語:witch),而這好笑的雙關語仍存於多數的英文教材裡。在費馬的著作(Oeuvres, I, 279-280; III, 233-234)就已經出現這條曲線,其名稱versiera是格蘭迪取的,在牛頓的曲線分類中,它是第63類……第一個使用女巫來描述這條曲線的可能是威廉森在1875年的《Integral calculus》中首次使用[9]

另一方面,史蒂芬·史蒂格勒認為是格蘭迪自已在玩文字遊戲[10]

應用

箕舌線除了其理論的性質外.也常出現在現實生活中.不過這次應用是在20世紀末期及21世紀才有足夠的了解。在為一些物體現象建立數學模型時,會出現箕舌線[11]。 此方程式近似光線及X光的譜線分布,也是共振電路中的能量耗散量。

箕舌線和柯西分布機率密度函數有相同的形式。

光滑小山嶽的截面也類似箕舌線。在數學建模中已用箕舌線作為一種流場的障礙物[12][13]

 

確定論理之條件︰

馬爾可夫不等式

機率論中,馬爾可夫不等式給出了隨機變量的函數大於等於某正數的機率的上界。雖然它以俄國數學家安德雷·馬爾可夫命名,但該不等式曾出現在一些更早的文獻中,其中包括馬爾可夫的老師–巴夫尼提·列波維奇·切比雪夫

馬爾可夫不等式把機率關聯到數學期望,給出了隨機變量的累積分布函數一個寬泛但仍有用的界。

馬爾可夫不等式的一個應用是,不超過1/5的人口會有超過5倍於人均收入的收入。

馬爾可夫不等式提供了  f(x)超過某特定數值  \epsilon (圖中標示紅色線處)機率的上界,其上界包括了特定數值  f的平均值

表達式

X為一非負隨機變量,則

  {\displaystyle \mathrm {P} (X\geq a)\leq {\frac {\mathrm {E} (X)}{a}}.}[1]

若用測度領域的術語來表示,馬爾可夫不等式可表示為若(X, Σ, μ)是一個測度空間,ƒ可測擴展實數的函數,且  \epsilon \geq 0,則

\mu (\{x\in X:|f(x)|\geq \epsilon \})\leq {1 \over \epsilon }\int _{X}|f|\,d\mu .

有時上述的不等式會被稱為切比雪夫不等式[2]

對於單調增加函數的擴展版本

φ是定義在非負實數上的單調增加函數,且其值非負,X是一個隨機變量,a ≥ 0,且φ(a) > 0,則

  {\displaystyle \mathbb {P} (|X|\geq a)\leq {\frac {\mathbb {E} (\varphi (|X|))}{\varphi (a)}}}

用來推論切比雪夫不等式

切比雪夫不等式使用變異數來作為一隨機變數超過平均值機率的上限,可以用下式表示:

  \Pr(|X-{\textrm {E}}(X)|\geq a)\leq {\frac {{\textrm {Var}}(X)}{a^{2}}},

對任意a>0,Var(X)為X的變異數,定義如下:

\operatorname {Var}(X)=\operatorname {E}[(X-\operatorname {E}(X))^{2}].

若以馬爾可夫不等式為基礎,切比雪夫不等式可視為考慮以下隨機變數

(X-\operatorname {E}(X))^{2}

根據馬爾可夫不等式,可得到以下的結果

\Pr((X-\operatorname {E}(X))^{2}\geq a^{2})\leq {\frac {\operatorname {Var}(X)}{a^{2}}},