STEM 隨筆︰古典力學︰動力學【五‧五‧丙】

一九五三年英國數學家 John Edensor Littlewood 出版了一本『一位數學家的雜談A Mathematician’s Miscellany 的書,書中談到了一個『超級任務』supertask 的悖論。之後於一九八八年美國統計學家 Sheldon Ross 在『A First Course in Probability』書中將這個悖論做了擴張,現今稱作『Ross–Littlewood paradox』之『瓶與球問題』︰

設想有一個空『』以及擁有無限可供應之『』。一個無限步驟的任務將要執行,每一步都有球被『放入』與『移出』瓶中,提出的問題是︰當任務完成時,瓶中有多少個球

為了完成這個無限任務,假設正午之前一分鐘,這個瓶子是空的,如下的步驟將被執行︰

第一步執行於正午之前三十秒。
第二步執行於正午之前十五秒。
每下一步之執行於上一步時間之半,也就是說第 n 步執行於正午之前 2^{-n} 分。

這保證正午之前執行了可數的無限多個步驟。由於每一步執行於前一步時間之半,因此這無限多個步驟剛好在正午完成。每一步中將有十個球放入瓶中,一個球移出瓶中,如此到正午之時,到底是有幾個球在瓶中的呢?

這個問題有很多學者提出各種不同的答案︰『無限說』、『空瓶說』、『任意個數說』與『題目矛盾說』等等。為什麼會這樣的呢?假使有兩個可數的無窮級數,一個是『收斂的』convergent c_1+c_2+c_3,+ \cdots + c_n + \cdots,另一個是『發散的』divergent d_1+d_2+d_3,+ \cdots + d_n + \cdots從『極限理論』來講一個收斂的無窮級數可以任意安排項次的『求和次序』,而它的『極限值』不變。但是一個發散的無窮級數,不同的求和次序將得到多種不同的值,所以才說它是發散的。瓶與球的問題,將一個收斂的級數 30 \cdot (1+2^{-1}+2^{-2}+\cdots+2^{-n}+\cdots )『一對一』的對應到一個發散的級數 (10-1)+10-1+10-1+\cdots+(10-1)+\cdots無怪乎會有那麼多不同的答案,事實是這種對應之操作是『不合理』的啊!!

─── 《改不改??變不變!!

 

所謂他山之石可以攻錯,質性類似也。所謂殊途同歸,目的相同方法不同爾。如斯者容易通達乎?

讀死書誠不如善活用也!果然行家所見略同耶??

Tikhonov regularization

Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, in machine learning it is known asweight decay, and with multiple independent discoveries, it is also variously known as the Tikhonov–Miller method, the Phillips–Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg–Marquardt algorithm for non-linear least-squares problems.

Suppose that for a known matrix \displaystyle A and vector \displaystyle \mathbf {b} , we wish to find a vector \displaystyle \mathbf {x} such that:

\displaystyle A\mathbf {x} =\mathbf {b}

The standard approach is ordinary least squares linear regression. However, if no \displaystyle \mathbf {x} satisfies the equation or more than one \displaystyle \mathbf {x} does—that is, the solution is not unique—the problem is said to be ill posed. In such cases, ordinary least squares estimation leads to an overdetermined (over-fitted), or more often an underdetermined (under-fitted) system of equations. Most real-world phenomena have the effect of low-pass filters in the forward direction where \displaystyle A maps \displaystyle \mathbf {x} to \displaystyle \mathbf {b} . Therefore, in solving the inverse-problem, the inverse mapping operates as a high-pass filter that has the undesirable tendency of amplifying noise (eigenvalues / singular values are largest in the reverse mapping where they were smallest in the forward mapping). In addition, ordinary least squares implicitly nullifies every element of the reconstructed version of \displaystyle \mathbf {x} that is in the null-space of \displaystyle A , rather than allowing for a model to be used as a prior for \displaystyle \mathbf {x} . Ordinary least squares seeks to minimize the sum of squared residuals, which can be compactly written as:
\displaystyle \|A\mathbf {x} -\mathbf {b} \|_{2}^{2}
where \displaystyle \left\|\cdot \right\|_{2} is the Euclidean norm.

In order to give preference to a particular solution with desirable properties, a regularization term can be included in this minimization:

\displaystyle \|A\mathbf {x} -\mathbf {b} \|_{2}^{2}+\|\Gamma \mathbf {x} \|_{2}^{2}

for some suitably chosen Tikhonov matrix, \displaystyle \Gamma . In many cases, this matrix is chosen as a multiple of the identity matrix (\displaystyle \Gamma =\alpha I), giving preference to solutions with smaller norms; this is known as L2 regularization.[1] In other cases, high-pass operators (e.g., a difference operator or a weighted Fourier operator) may be used to enforce smoothness if the underlying vector is believed to be mostly continuous. This regularization improves the conditioning of the problem, thus enabling a direct numerical solution. An explicit solution, denoted by \displaystyle {\hat {x}} , is given by:
\displaystyle {\hat {x}}=(A^{\top }A+\Gamma ^{\top }\Gamma )^{-1}A^{\top }\mathbf {b}
The effect of regularization may be varied via the scale of matrix \displaystyle \Gamma . For \displaystyle \Gamma =0 this reduces to the unregularized least squares solution provided that (ATA)−1 exists.

L2 regularization is used in many contexts aside from linear regression, such as classification with logistic regression or support vector machines,[2] and matrix factorization.[3]

History

Tikhonov regularization has been invented independently in many different contexts. It became widely known from its application to integral equations from the work of Andrey Tikhonov and David L. Phillips. Some authors use the term Tikhonov–Phillips regularization. The finite-dimensional case was expounded by Arthur E. Hoerl, who took a statistical approach, and by Manus Foster, who interpreted this method as a Wiener–Kolmogorov (Kriging) filter. Following Hoerl, it is known in the statistical literature as ridge regression.

Bayesian interpretation

Although at first the choice of the solution to this regularized problem may look artificial, and indeed the matrix \displaystyle \Gamma seems rather arbitrary, the process can be justified from a Bayesian point of view. Note that for an ill-posed problem one must necessarily introduce some additional assumptions in order to get a unique solution. Statistically, the prior probability distribution of \displaystyle x is sometimes taken to be a multivariate normal distribution. For simplicity here, the following assumptions are made: the means are zero; their components are independent; the components have the same standard deviation \displaystyle \sigma _{x}. The data are also subject to errors, and the errors in \displaystyle b are also assumed to be independent with zero mean and standard deviation \displaystyle \sigma _{b} . Under these assumptions the Tikhonov-regularized solution is the most probable solution given the data and the a priori distribution of \displaystyle x , according to Bayes’ theorem.[4]

If the assumption of normality is replaced by assumptions of homoscedasticity and uncorrelatedness of errors, and if one still assumes zero mean, then the Gauss–Markov theorem entails that the solution is the minimal unbiased estimator.[citation needed]

 

此所以借

Motivating example

 Explicit numerical methods exhibiting instability when integrating a stiff ordinary differential equation

Consider the initial value problem

\displaystyle \,y'(t)=-15y(t),\quad t\geq 0,y(0)=1.}
     
(1)

The exact solution (shown in cyan) is

\displaystyle y(t)=e^{-15t}\, with \displaystyle y(t)\to 0 as \displaystyle t\to \infty .
     
(2)

We seek a numerical solution that exhibits the same behavior.

The figure (right) illustrates the numerical issues for various numerical integrators applied on the equation.

  1. Euler’s method with a step size of h = 1/4 oscillates wildly and quickly exits the range of the graph (shown in red).
  2. Euler’s method with half the step size, h = 1/8, produces a solution within the graph boundaries, but oscillates about zero (shown in green).
  3. The trapezoidal method (that is, the two-stage Adams–Moulton method) is given by
    \displaystyle y_{n+1}=y_{n}+{\frac {1}{2}}h\left(f(t_{n},y_{n})+f(t_{n+1},y_{n+1})\right),   (3)

    where \displaystyle \textstyle y'=f(t,y) . Applying this method instead of Euler’s method gives a much better result (blue). The numerical results decrease monotonically to zero, just as the exact solution does.

 

改適 PyTrajectory 程式庫用法,先察究符合正確解的『始』、『終』狀態 ── 也就說『控制』 u 實等於 0 ── 之現象也!!