時間序列︰高爾頓梅花機

想了解時間序列

X_t = \alpha + \beta t + white \ noise

之『趨勢』者,怎麼能不曉得

迴歸分析

迴歸分析英語:Regression Analysis)是一種統計學上 分析數據的方法,目的在於了解兩個或多個變數間是否相關、相關方向與強度,並建立數學模型以便觀察特定變數來預測研究者感興趣的變數 。更具體的來說,迴歸 分析可以幫助人們了解在只有一個自變量變化時因變量的變化量。一般來說,通過迴歸分析我們可以由給出的自變量估計因變量的條件期望。

迴歸分析是建立因變數  Y(或稱依變數,反應變數)與自變數  X(或稱獨變數,解釋變數)之間關係的模型。簡單線性回歸使用一個自變量  X複迴歸使用超過一個自變量( X_{1},X_{2}...X_{i})。

起源

迴歸的最早形式是最小平方法,由1805年的勒壤得(Legendre)[1],和1809年的高斯(Gauss)出版[2]。勒壤得和高斯都將該方法應用於從天文觀測中確定關於太陽的物體的軌道(主要是彗星,但後來是新發現的小行星)的問題。 高斯在1821年發表了最小平方理論的進一步發展[3],包括高斯-馬可夫定理的一個版本。

「迴歸」(或作「回歸」)一詞最早由法蘭西斯·高爾頓(Francis Galton)所使用[4][5]。他曾對親子間的身高做研究,發現父母的身高雖然會遺傳給子女,但子女的身高卻有逐漸「迴歸到中等(即人的平均值)」的現象。不過當時的迴歸和現在的迴歸在意義上已不盡相同。

在1950年代和60年代,經濟學家使用機械電子桌面計算器來計算迴歸。在1970年之前,它有時需要長達24小時從一個迴歸接收結果[6]

 

呢?想知道『迴歸』一詞的意義,還得先知道

高爾頓板

高爾頓板英語:Galton board),又稱為豆機bean machine)、梅花quincunx)等,是弗朗西斯·高爾頓發明的用以驗證中心極限定理的裝置。[1]

高爾頓板為一塊豎直放置的板,上面有交錯排列的釘子。讓小球從板的上端自由下落,當其碰到釘子後會隨機向左或向右落下。最終,小球會落至板底端的某一格子中。假設板上共有n排釘子,每個小球撞擊釘子後向右落下的機率為p(當左、右機率相同時p為0.5),則小球落入第k個格子機率為二項分布 {\displaystyle {n \choose k}p^{k}(1-p)^{n-k}}。根據中心極限定理,當n足夠大時,該分布近似於常態分布。此時,將大量小球落至格中,格子中的小球數量即近似於常態分布的鐘形曲線[2]

高爾頓繪製的高爾頓板示意圖

 

!明白在『大數法則』下,圖七之『二項分布』近似於『常態分布 』。將能了解『回歸平均』的旨趣

Regression toward the mean

In statistics, regression toward (or to) the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement—and if it is extreme on its second measurement, it will tend to have been closer to the average on its first.[1][2][3] To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data.[4]

The conditions under which regression toward the mean occurs depend on the way the term is mathematically defined. Sir Francis Galton first observed the phenomenon in the context of simple linear regression of data points. Galton[5] developed the following model: pellets fall through a quincunx forming a normal distribution centered directly under their entrance point. These pellets could then be released down into a second gallery corresponding to a second measurement occasion. Galton then asked the reverse question, “From where did these pellets come?”

“The answer was not on average directly above. Rather it was on average, more towards the middle, for the simple reason that there were more pellets above it towards the middle that could wander left than there were in the left extreme that could wander to the right, inwards” (p 477)[6]

A less restrictive approach is possible. Regression towards the mean can be defined for any bivariate distribution with identical marginal distributions. Two such definitions exist.[7] One definition accords closely with the common usage of the term “regression towards the mean”. Not all such bivariate distributions show regression towards the mean under this definition. However, all such bivariate distributions show regression towards the mean under the other definition.

Historically, what is now called regression toward the mean has also been called reversion to the mean and reversion to mediocrity.

In finance, the term mean reversion has a different meaning. Jeremy Siegel uses it to describe a financial time series in which “returns can be very unstable in the short run but very stable in the long run.” More quantitatively, it is one in which the standard deviation of average annual returns declines faster than the inverse of the holding period, implying that the process is not a random walk, but that periods of lower returns are systematically followed by compensating periods of higher returns, in seasonal businesses for example.[8]

……

Definition for simple linear regression of data points

This is the definition of regression toward the mean that closely follows Sir Francis Galton‘s original usage.[9]

Suppose there are n data points {yi, xi}, where i = 1, 2, …, n. We want to find the equation of the regression line, i.e. the straight line

y=\alpha +\beta x,\,

which would provide a “best” fit for the data points. (Note that a straight line may not be the appropriate regression curve for the given data points.) Here the “best” will be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers α and β solve the following minimization problem:

Find  \min _{\alpha ,\,\beta }Q(\alpha ,\beta ), where Q(\alpha ,\beta )=\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{\,2}=\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})^{2}\

Using calculus it can be shown that the values of α and β that minimize the objective function Q are

{\begin{aligned}&{\hat {\beta }}={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}={\frac {{\overline {xy}}-{\bar {x}}{\bar {y}}}{{\overline {x^{2}}}-{\bar {x}}^{2}}}={\frac {\operatorname {Cov} [x,y]}{\operatorname {Var} [x]}}=r_{xy}{\frac {s_{y}}{s_{x}}},\\&{\hat {\alpha }}={\bar {y}}-{\hat {\beta }}\,{\bar {x}},\end{aligned}}

where rxy is the sample correlation coefficient between x and y, sx is the standard deviation of x, and sy is correspondingly the standard deviation of y. Horizontal bar over a variable means the sample average of that variable. For example:  {\overline {xy}}={\tfrac {1}{n}}\textstyle \sum _{i=1}^{n}x_{i}y_{i}\ .

Substituting the above expressions for  {\hat {\alpha }} and  {\hat {\beta }} into  y=\alpha +\beta x,\, yields fitted values

  {\hat {y}}={\hat {\alpha }}+{\hat {\beta }}x,\,

which yields

{\frac {{\hat {y}}-{\bar {y}}}{s_{y}}}=r_{xy}{\frac {x-{\bar {x}}}{s_{x}}}

This shows the role rxy plays in the regression line of standardized data points.

If −1 < rxy < 1, then we say that the data points exhibit regression toward the mean. In other words, if linear regression is the appropriate model for a set of data points whose sample correlation coefficient is not perfect, then there is regression toward the mean. The predicted (or fitted) standardized value of y is closer to its mean than the standardized value of x is to its mean.

───

 

,探討圖八之意涵耶?!

 The convolution theorem and its applications

What is a convolution?

One of the most important concepts in Fourier theory, and in crystallography, is that of a convolution. Convolutions arise in many guises, as will be shown below. Because of a mathematical property of the Fourier transform, referred to as the convolution theorem, it is convenient to carry out calculations involving convolutions.

But first we should define what a convolution is. Understanding the concept of a convolution operation is more important than understanding a proof of the convolution theorem, but it may be more difficult!

Mathematically, a convolution is defined as the integral over all space of one function at x times another function at u-x. The integration is taken over the variable x (which may be a 1D or 3D variable), typically from minus infinity to infinity over all the dimensions. So the convolution is a function of a new variable u, as shown in the following equations. The cross in a circle is used to indicate the convolution operation.

convolution equals integral of one function at x times other function at u-x

Note that it doesn’t matter which function you take first, i.e. the convolution operation is commutative. We’ll prove that below, but you should think about this in terms of the illustration below. This illustration shows how you can think about the convolution, as giving a weighted sum of shifted copies of one function: the weights are given by the function value of the second function at the shift vector. The top pair of graphs shows the original functions. The next three pairs of graphs show (on the left) the function g shifted by various values of x and, on the right, that shifted function g multiplied by f at the value of x.

illustration of convolution

The bottom pair of graphs shows, on the left, the superposition of several weighted and shifted copies of g and, on the right, the integral (i.e. the sum of all the weighted, shifted copies of g). You can see that the biggest contribution comes from the copy shifted by 3, i.e. the position of the peak of f.

If one of the functions is unimodal (has one peak), as in this illustration, the other function will be shifted by a vector equivalent to the position of the peak, and smeared out by an amount that depends on how sharp the peak is. But alternatively we could switch the roles of the two functions, and we would see that the bimodal function g has doubled the peaks of the unimodal function f.

─── 摘自《勇闖新世界︰ W!o《卡夫卡村》變形祭︰品味科學‧教具教材‧【專題】 PD‧箱子世界‧摺積