時間序列︰伯特蘭悖論

一個百年前的悖論

伯特蘭悖論 (機率論)

伯特蘭悖論是一個有關機率論的傳統解釋會導致的悖論。約瑟·伯特蘭於1888年在他的著作《Calcul des probabilités》中提到此悖論,用來舉例說明,若產生隨機變數的「機制」或「方法」沒有清楚定義好的話,機率也將無法得到良好的定義。

伯特蘭悖論的內容

伯特蘭悖論的內容如下:考慮一個內接於的等邊三角形。若隨機選方圓上的個弦,則此弦的長度比三角形的邊較長的機率為何?

伯特蘭給出了三個論證,全都是明顯有效的,但導致的結果都不相同。

  1. 隨機的弦,方法1;紅=比三角形的邊較長,藍=比三角形的邊較短

    「隨機端點」方法:在圓周上隨機選給兩點,並畫出連接兩點的弦。為了計算問題中的機率,可以想像三角形會旋轉,使得其頂點會碰到弦端點中的一點。可觀察 到,若另一個弦端點在弦會穿過三角形的一邊的弧上,則弦的長度會比三角形的邊較長。而弧的長度是圓周的三分之一,因此隨機的弦會比三角形的邊較長的機率亦 為三分之一。

  2. 隨機的弦,方法2

    「隨機半徑」方法:選擇一個圓的半徑和半徑上的一點,再畫出通過此點並垂直半徑的弦。為了計算問題的機率,可以想像三角形會旋轉,使得其一邊會垂直於半 徑。可觀察到,若選擇的點比三角形和半徑相交的點要接近圓的中心,則弦的長度會比三角形的邊較長。三角形的邊會平分半徑,因此隨機的弦會比三角形的邊較長 的機率亦為二分之一。

  3. 隨機的弦,方法3

    「隨機中點」方法:選擇圓內的任意一點,並畫出以此點為中點的弦。可觀察到,若選擇的點落在半徑只有大圓的半徑的二分之一的同心圓之內,則弦的長度會比三角形的邊較長。小圓的面積是大圓的四分之一,因此隨機的弦會比三角形的邊較長的機率亦為四分之一。

上述方法可以如下圖示。每一個弦都可以被其中點唯一決定。上述三種方法會給出不同中點的分布。方法1和方法2會給出兩種不同不均勻的分布,而方法3則會給出一個均勻的方法。但另一方面,若直接看弦的分布,方法2的弦會看起來比較均勻,而方法1和方法3的弦則較不均勻。

 

隨機的弦的中點,方法1

隨機的弦的中點,方法2

隨機的弦的中點,方法3

隨機的弦,方法1

隨機的弦,方法2

隨機的弦,方法3

還可以想出許多其他的分布方法。每一種方法,其隨機的弦會比三角形的邊較長的機率都可能不一樣。

 

至今依舊無解。試想任一實數的『開區間』都可以對應整體實數,那麼『樣本空間』之『機率測度』能不謹慎乎?就像一個處處連續但卻處處不可微分的函数令人驚訝!

一八七二年,現代分析之父,德國的卡爾‧特奧多爾‧威廉‧魏爾斯特拉斯 Karl Theodor Wilhelm Weierstraß 給出一個處處連續但卻處處不可微分的這種非直覺性之函数︰

f(x)= \sum_{n=0} ^\infty a^n \cos(b^n \pi x),

其中 0<a<1, b 為正的奇數,使得:ab > 1+\frac{3}{2} \pi

 

那美麗的『科赫雪花』由連續之線段,極限而成,

一九零四年瑞典數學家尼爾斯‧法比安‧海里格‧馮‧科赫 Niels Fabian Helge von Koch 不用著魏爾施特拉斯那種抽象又解析之定義,給出了現今稱作『科赫雪花』的直觀幾何學構造,……

─── 摘自《加百利之號角!!

 

試問在『科赫雪花』上定義的『距離』函數,極限下『連續』嗎?

雖然傑尼斯  的論述很有啟發性︰

Jaynes’s solution using the “maximum ignorance” principle

In his 1973 paper “The Well-Posed Problem“,[2] Edwin Jaynes proposed a solution to Bertrand’s paradox, based on the principle of “maximum ignorance”—that we should not use any information that is not given in the statement of the problem. Jaynes pointed out that Bertrand’s problem does not specify the position or size of the circle, and argued that therefore any definite and objective solution must be “indifferent” to size and position. In other words: the solution must be both scale and translation invariant.

To illustrate: assume that chords are laid at random onto a circle with a diameter of 2, for example by throwing straws onto it from far away. Now another circle with a smaller diameter (e.g., 1.1) is laid into the larger circle. Then the distribution of the chords on that smaller circle needs to be the same as on the larger circle. If the smaller circle is moved around within the larger circle, the probability must not change either. It can be seen very easily that there would be a change for method 3: the chord distribution on the small red circle looks qualitatively different from the distribution on the large circle:

 

Bertrand3-translate ru.svg

The same occurs for method 1, though it is harder to see in a graphical representation. Method 2 is the only one that is both scale invariant and translation invariant; method 3 is just scale invariant, method 1 is neither.

However, Jaynes did not just use invariances to accept or reject given methods: this would leave the possibility that there is another not yet described method that would meet his common-sense criteria. Jaynes used the integral equations describing the invariances to directly determine the probability distribution. In this problem, the integral equations indeed have a unique solution, and it is precisely what was called “method 2” above, the random radius method.

 

,但是我們事實無法先驗的知道大自然會有多種統計學︰

麥克斯韋-玻爾茲曼統計

費米-狄拉克統計

玻色-愛因斯坦統計

 

何不回顧省思一下存不存在『無差異原理』呢?

Principle of indifference

The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities. Suppose that there are n > 1 mutually exclusive and collectively exhaustive possibilities. The principle of indifference states that if the n possibilities are indistinguishable except for their names, then each possibility should be assigned a probability equal to 1/n.

In Bayesian probability, this is the simplest non-informative prior. The principle of indifference is meaningless under the frequency interpretation of probability,[citation needed] in which probabilities are relative frequencies rather than degrees of belief in uncertain propositions, conditional upon state information.

 

Frequentist probability

Frequentist probability or frequentism is an interpretation of probability; it defines an event’s probability as the limit of its relative frequency in a large number of trials. This interpretation supports the statistical needs of experimental scientists and pollsters; probabilities can be found (in principle) by a repeatable objective process (and are thus ideally devoid of opinion). It does not support all needs; gamblers typically require estimates of the odds without experiments.

The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the classical interpretation. In the classical interpretation, probability was defined in terms of the principle of indifference, based on the natural symmetry of a problem, so, e.g. the probabilities of dice games arise from the natural symmetric 6-sidedness of the cube. This classical interpretation stumbled at any statistical problem that has no natural symmetry for reasoning.

Definition

In the frequentist interpretation, probabilities are discussed only when dealing with well-defined random experiments (or random samples).[1] The set of all possible outcomes of a random experiment is called the sample space of the experiment. An event is defined as a particular subset of the sample space to be considered. For any given event, only one of two possibilities may hold: it occurs or it does not. The relative frequency of occurrence of an event, observed in a number of repetitions of the experiment, is a measure of the probability of that event. This is the core conception of probability in the frequentist interpretation.

Thus, if  n_t is the total number of trials and  n_{x} is the number of trials where the event  x occurred, the probability  P(x) of the event occurring will be approximated by the relative frequency as follows:

P(x)\approx {\frac {n_{x}}{n_{t}}}.

Clearly, as the number of trials is increased, one might expect the relative frequency to become a better approximation of a “true frequency”.

A claim of the frequentist approach is that in the “long run,” as the number of trials approaches infinity, the relative frequency will converge exactly to the true probability:[2]

P(x)=\lim _{{n_{t}\rightarrow \infty }}{\frac {n_{x}}{n_{t}}}.

 

Bayesian probability

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, assigned probabilities represent states of knowledge[1] or belief.[2]

The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with hypotheses, i.e., the propositions whose truth or falsity is uncertain. In the Bayesian view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability.

Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated to a posterior probability in the light of new, relevant data (evidence).[3] The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.

The term “Bayesian” derives from the 18th century mathematician and theologian Thomas Bayes, who provided the first mathematical treatment of a non-trivial problem of Bayesian inference.[4] Mathematician Pierre-Simon Laplace pioneered and popularised what is now called Bayesian probability.[5]

Broadly speaking, there are two views on Bayesian probability that interpret the probability concept in different ways. According to the objectivist view, probability represents the state of knowledge, can be interpreted as an extension of logic, and its rules can be justified by Cox’s requirements of rationality and consistency.[1][6] According to the subjectivist view, probability quantifies a personal belief, and its rules can be justified by requirements of rationality and coherence following from the Dutch book argument or from the decision theory and de Finetti’s theorem.[2]

Bayesian methodology

Bayesian methods are characterized by concepts and procedures as follows:

  • The use of random variables or, more generally, unknown quantities,[7] to model all sources of uncertainty in statistical models. This also includes uncertainty resulting from lack of information (see also aleatoric and epistemic uncertainty).
  • The need to determine the prior probability distribution taking into account the available (prior) information.
  • The sequential use of Bayes’ formula: when more data become available, calculate the posterior distribution using Bayes’ formula; subsequently, the posterior distribution becomes the next prior.
  • While for the frequentist a hypothesis is a proposition (which must be either true or false), so that the frequentist probability of a hypothesis is either 0 or 1, in Bayesian statistics the probability that can be assigned to a hypothesis can also be in a range from 0 to 1 if the truth value is uncertain.

Objective and subjective Bayesian probabilities

Broadly speaking, there are two interpretations on Bayesian probability. For objectivists, probability objectively measures the plausibility of propositions, i.e., probability corresponds to a reasonable belief everyone (even a “robot”) sharing the same knowledge should share in accordance with the rules of Bayesian statistics, which can be justified by Cox’s requirements of rationality and consistency.[1][6] For subjectivists, probability corresponds to a “personal belief”;[2] rationality and coherence allow for substantial variation within the constraints they pose. The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.

 

Classical definition of probability

The classical definition or interpretation of probability is identified[1] with the works of Jacob Bernoulli and Pierre-Simon Laplace. As stated in Laplace’s Théorie analytique des probabilités,

The probability of an event is the ratio of the number of cases favorable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible.

This definition is essentially a consequence of the principle of indifference. If elementary events are assigned equal probabilities, then the probability of a disjunction of elementary events is just the number of events in the disjunction divided by the total number of elementary events.

The classical definition of probability was called into question by several writers of the nineteenth century, including John Venn and George Boole.[2] The frequentist definition of probability became widely accepted as a result of their criticism, and especially through the works of R.A. Fisher. The classical definition enjoyed a revival of sorts due to the general interest in Bayesian probability, because Bayesian methods require a prior probability distribution and the principle of indifference offers one source of such a distribution. Classical probability can offer prior probabilities that reflect ignorance which often seems appropriate before an experiment is conducted.