GoPiGo 小汽車︰格點圖像算術《投影幾何》【五‧線性代數】《導引八》觀察者《變換‧G 》

傳聞有一回蘇東坡讀了王摩詰之詩:

山中》‧王維

藍田白石出,
玉川紅葉稀,
山路原無雨,
空翠濕人衣。

有感而發的說:

味王摩詰之詩,詩中有畫;觀王摩詰之畫,畫中有詩。

王維畫作

天地景物是否能漫妙動人?但問觀者之所感耳。感觸激起了念頭,心聲發響而為文,自然兩相應乎!此靜中之美也。此時若讀王之渙之《登鸛雀樓》

白日依山盡,
黃河入海流。
欲窮千里目,
更上一層樓。

或可得動中之美耶?!

這著名的古代樓閣

鸛雀樓原址位於山西蒲州(現山西省永濟市蒲州鎮[1],在南北朝時代,北周北齊在這裡形成軍事對峙形勢,北周的將軍宇文護為了防禦,在蒲州西門外建築了一座高樓,作為軍事瞭望台,因為經常有鸛鳥在上面棲息築巢,所以被稱為「鸛雀樓」,此地可以俯瞰黃河,所以吸引了歷代許多文人登樓弔古抒懷[2]沈括夢溪筆談》記:「河中府鸛雀樓唐人留詩者甚多,唯李益、王文奐、暢諸三篇能狀其景」。

,早已毀於戰火。縱使再重建,而且

登樓可望白日依山,黃河入海的壯觀氣象。

難得昔時之情矣。

為什麼此詩能成千古絕唱呢?或許表裡所說大不同的乎?可能不同年齡心境解讀相異得很!就略講講讀後感言吧︰

【白日依山盡】

日子來了,日子走了。一日之朝霞和夕陽最能動人。所以『白日當空照』的時光易過,突然驚覺『一日將盡』。這個『日依山盡』實是日日『依山盡』,以一『白』表現『易逝』的光陰,依舊是日日留『白』的了。看似寫『黃昏』之景,卻寄寓『一日將盡』之意。

【黃河入海流。】

河河皆川流不息,豈獨『黃河』入海流。雖是應景的直白,總似暗道『時時』流逝之生命。天涯過客意往何處?欲成何事?終究是在『有限』時流中,就算『黃河』之水從『天上來』,不息之川流定有『入海』之時的吧。

【欲窮千里目,更上一層樓。

『登高』能『望遠』,將有所圖乎?『黃河』之外是否有『社稷』 !『京畿』之內能否存『功名』!!無論圖謀什麼?怎會沒有大小之分?豈會無有高遠之別的呢??

─── 《9va-pi ︰ 讀和想 《四》

 

與其玄說觀者實未曾睹物自身,莫若直白觀者能見物像也。物、像所處之概念世界不必相同。物之像不過現象之感知而已矣。即使就影音、相片復現物之形貌而言,終將滿足視覺自然事理耳︰

《鹿柴》詩意圖(吳子玉繪)

王維
鹿柴
空山不見人,
但聞人語響。
返景入深林,
復照青苔上。

一件事如果沒有它發生之理,那它能夠發生的嗎?假使那事果然都不發生,又如何能夠得到那個發生之理?這就是『事、理不二』的道理,發生之事蘊有發生之理,發生之理緣起發生之事。那『空山樹倒』是否是發生了一件事呢?假使有人砍了你的『櫻桃樹』又是不是發生了一件事的呢??也許『世間事』的『聞問與否』,決之於『關不關心』,通常愈在意卻愈心煩意亂!!

─── 摘自《樹莓 λ 者程式探源

 

雖然,物坐落於世界中,那眼之為物,何嘗不然?

 

故觀者樂以其眼作準繩哩。因此物、像之分殊,或需座標變換者,無異將數據轉成我用乎!庶可免

是先平移??

Perspective projection

When the human eye views a scene, objects in the distance appear smaller than objects close by – this is known as perspective. While orthographic projection ignores this effect to allow accurate measurements, perspective projection shows distant objects as smaller to provide additional realism.

The perspective projection requires a more involved definition as compared to orthographic projections. A conceptual aid to understanding the mechanics of this projection is to imagine the 2D projection as though the object(s) are being viewed through a camera viewfinder. The camera’s position, orientation, and field of view control the behavior of the projection transformation. The following variables are defined to describe this transformation:

  •   {\mathbf {a}}_{{x,y,z}} – the 3D position of a point A that is to be projected.
  •   {\mathbf {c}}_{{x,y,z}} – the 3D position of a point C representing the camera.
  •   {\mathbf {\theta }}_{{x,y,z}} – The orientation of the camera (represented by Tait–Bryan angles).
  •   {\mathbf {e}}_{{x,y,z}} – the viewer’s position relative to the display surface [3] which goes through point C representing the camera.

Which results in:

  •   {\mathbf {b}}_{{x,y}} – the 2D projection of  \mathbf {a} .

When  {\mathbf {c}}_{{x,y,z}}=\langle 0,0,0\rangle , and {\mathbf {\theta }}_{{x,y,z}}=\langle 0,0,0\rangle , the 3D vector  \langle 1,2,0\rangle is projected to the 2D vector  \langle 1,2\rangle .

Otherwise, to compute {\mathbf {b}}_{{x,y}} we first define a vector  {\mathbf {d}}_{{x,y,z}} as the position of point A with respect to a coordinate system defined by the camera, with origin in C and rotated by  \mathbf {\theta } with respect to the initial coordinate system. This is achieved by subtracting  \mathbf {c} from  \mathbf {a} and then applying a rotation by -{\mathbf {\theta }} to the result. This transformation is often called a camera transform, and can be expressed as follows, expressing the rotation in terms of rotations about the x, y, and z axes (these calculations assume that the axes are ordered as a left-handed system of axes): [4] [5]

{\displaystyle {\begin{bmatrix}\mathbf {d} _{x}\\\mathbf {d} _{y}\\\mathbf {d} _{z}\\\end{bmatrix}}={\begin{bmatrix}1&0&0\\0&{\cos(\mathbf {\theta } _{x})}&{\sin(\mathbf {\theta } _{x})}\\0&{-\sin(\mathbf {\theta } _{x})}&{\cos(\mathbf {\theta } _{x})}\\\end{bmatrix}}{\begin{bmatrix}{\cos(\mathbf {\theta } _{y})}&0&{-\sin(\mathbf {\theta } _{y})}\\0&1&0\\{\sin(\mathbf {\theta } _{y})}&0&{\cos(\mathbf {\theta } _{y})}\\\end{bmatrix}}{\begin{bmatrix}{\cos(\mathbf {\theta } _{z})}&{\sin(\mathbf {\theta } _{z})}&0\\{-\sin(\mathbf {\theta } _{z})}&{\cos(\mathbf {\theta } _{z})}&0\\0&0&1\\\end{bmatrix}}\left({{\begin{bmatrix}\mathbf {a} _{x}\\\mathbf {a} _{y}\\\mathbf {a} _{z}\\\end{bmatrix}}-{\begin{bmatrix}\mathbf {c} _{x}\\\mathbf {c} _{y}\\\mathbf {c} _{z}\\\end{bmatrix}}}\right)}

This representation corresponds to rotating by three Euler angles (more properly, Tait–Bryan angles), using the xyz convention, which can be interpreted either as “rotate about the extrinsic axes (axes of the scene) in the order z, y, x (reading right-to-left)” or “rotate about the intrinsic axes (axes of the camera) in the order x, y, z (reading left-to-right)”. Note that if the camera is not rotated {\mathbf {\theta }}_{{x,y,z}}=\langle 0,0,0\rangle ), then the matrices drop out (as identities), and this reduces to simply a shift:  {\mathbf {d}}={\mathbf {a}}-{\mathbf {c}}.

Alternatively, without using matrices (let’s replace (ax-cx) with x and so on, and abbreviate cosθ to c and sinθ to s):

{\begin{array}{lcl}{\mathbf {d}}_{x}=c_{y}(s_{z}{\mathbf {y}}+c_{z}{\mathbf {x}})-s_{y}{\mathbf {z}}\\{\mathbf {d}}_{y}=s_{x}(c_{y}{\mathbf {z}}+s_{y}(s_{z}{\mathbf {y}}+c_{z}{\mathbf {x}}))+c_{x}(c_{z}{\mathbf {y}}-s_{z}{\mathbf {x}})\\{\mathbf {d}}_{z}=c_{x}(c_{y}{\mathbf {z}}+s_{y}(s_{z}{\mathbf {y}}+c_{z}{\mathbf {x}}))-s_{x}(c_{z}{\mathbf {y}}-s_{z}{\mathbf {x}})\\\end{array}}

This transformed point can then be projected onto the 2D plane using the formula (here, x/y is used as the projection plane; literature also may use x/z):[6]

  {\begin{array}{lcl}{\mathbf {b}}_{x}&=&{\frac {{\mathbf {e}}_{z}}{{\mathbf {d}}_{z}}}{\mathbf {d}}_{x}-{\mathbf {e}}_{x}\\{\mathbf {b}}_{y}&=&{\frac {{\mathbf {e}}_{z}}{{\mathbf {d}}_{z}}}{\mathbf {d}}_{y}-{\mathbf {e}}_{y}\\\end{array}}.

Or, in matrix form using homogeneous coordinates, the system

{\begin{bmatrix}{\mathbf {f}}_{x}\\{\mathbf {f}}_{y}\\{\mathbf {f}}_{z}\\{\mathbf {f}}_{w}\\\end{bmatrix}}={\begin{bmatrix}1&0&-{\frac {{\mathbf {e}}_{x}}{{\mathbf {e}}_{z}}}&0\\0&1&-{\frac {{\mathbf {e}}_{y}}{{\mathbf {e}}_{z}}}&0\\0&0&1&0\\0&0&1/{\mathbf {e}}_{z}&0\\\end{bmatrix}}{\begin{bmatrix}{\mathbf {d}}_{x}\\{\mathbf {d}}_{y}\\{\mathbf {d}}_{z}\\1\\\end{bmatrix}}

in conjunction with an argument using similar triangles, leads to division by the homogeneous coordinate, giving

{\begin{array}{lcl}{\mathbf {b}}_{x}&=&{\mathbf {f}}_{x}/{\mathbf {f}}_{w}\\{\mathbf {b}}_{y}&=&{\mathbf {f}}_{y}/{\mathbf {f}}_{w}\\\end{array}}.

The distance of the viewer from the display surface,  \mathbf{e}_z, directly relates to the field of view, where \alpha =2\cdot \tan ^{{-1}}(1/{\mathbf {e}}_{z}) is the viewed angle. (Note: This assumes that you map the points (-1,-1) and (1,1) to the corners of your viewing surface)

The above equations can also be rewritten as:

{\begin{array}{lcl}{\mathbf {b}}_{x}=({\mathbf {d}}_{x}{\mathbf {s}}_{x})/({\mathbf {d}}_{z}{\mathbf {r}}_{x}){\mathbf {r}}_{z}\\{\mathbf {b}}_{y}=({\mathbf {d}}_{y}{\mathbf {s}}_{y})/({\mathbf {d}}_{z}{\mathbf {r}}_{y}){\mathbf {r}}_{z}\\\end{array}}.

In which  {\mathbf {s}}_{{x,y}} is the display size,  {\mathbf {r}}_{{x,y}} is the recording surface size (CCD or film),  {\mathbf {r}}_{z} is the distance from the recording surface to the entrance pupil (camera center), and  {\mathbf {d}}_{z} is the distance, from the 3D point being projected, to the entrance pupil.

Subsequent clipping and scaling operations may be necessary to map the 2D plane onto any particular display media.

Diagram

To determine which screen x-coordinate corresponds to a point at  A_{x},A_{z} multiply the point coordinates by:

B_{x}=A_{x}{\frac {B_{z}}{A_{z}}}

where

  B_x is the screen x coordinate
  A_x is the model x coordinate
  B_z is the focal length—the axial distance from the camera center to the image plane
  A_z is the subject distance.

Because the camera is in 3D, the same works for the screen y-coordinate, substituting y for x in the above diagram and equation.

 

或是先旋轉!!

Camera Calibration and 3D Reconstruction

The functions in this section use a so-called pinhole camera model. In this model, a scene view is formed by projecting 3D points into the image plane using a perspective transformation.

s \; m' = A [R|t] M'

or

s \vecthree{u}{v}{1} = \vecthreethree{f_x}{0}{c_x}{0}{f_y}{c_y}{0}{0}{1} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}

where:

  • (X, Y, Z) are the coordinates of a 3D point in the world coordinate space
  • (u, v) are the coordinates of the projection point in pixels
  • A is a camera matrix, or a matrix of intrinsic parameters
  • (cx, cy) is a principal point that is usually at the image center
  • fx, fy are the focal lengths expressed in pixel units.

Thus, if an image from the camera is scaled by a factor, all of these parameters should be scaled (multiplied/divided, respectively) by the same factor. The matrix of intrinsic parameters does not depend on the scene viewed. So, once estimated, it can be re-used as long as the focal length is fixed (in case of zoom lens). The joint rotation-translation matrix [R|t] is called a matrix of extrinsic parameters. It is used to describe the camera motion around a static scene, or vice versa, rigid motion of an object in front of a still camera. That is, [R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera. The transformation above is equivalent to the following (when z \ne 0 ):

\begin{array}{l} \vecthree{x}{y}{z} = R \vecthree{X}{Y}{Z} + t \\ x' = x/z \\ y' = y/z \\ u = f_x*x' + c_x \\ v = f_y*y' + c_y \end{array}

The following figure illustrates the pinhole camera model.

../../../_images/pinhole_camera_model.png

Real lenses usually have some distortion, mostly radial distortion and slight tangential distortion. So, the above model is extended as:

\begin{array}{l} \vecthree{x}{y}{z} = R \vecthree{X}{Y}{Z} + t \\ x' = x/z \\ y' = y/z \\ x'' = x' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2 p_1 x' y' + p_2(r^2 + 2 x'^2) \\ y'' = y' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y'^2) + 2 p_2 x' y' \\ \text{where} \quad r^2 = x'^2 + y'^2 \\ u = f_x*x'' + c_x \\ v = f_y*y'' + c_y \end{array}

k_1, k_2, k_3, k_4, k_5, and k_6 are radial distortion coefficients. p_1 and p_2 are tangential distortion coefficients. Higher-order coefficients are not considered in OpenCV.

The next figure shows two common types of radial distortion: barrel distortion (typically k_1 > 0 and pincushion distortion (typically k_1 < 0).

../../../_images/distortion_examples.png

In the functions below the coefficients are passed or returned as

(k_1, k_2, p_1, p_2[, k_3[, k_4, k_5, k_6]])

vector. That is, if the vector contains four elements, it means that k_3=0 . The distortion coefficients do not depend on the scene viewed. Thus, they also belong to the intrinsic camera parameters. And they remain the same regardless of the captured image resolution. If, for example, a camera has been calibrated on images of 320 x 240 resolution, absolutely the same distortion coefficients can be used for 640 x 480 images from the same camera while f_x, f_y, c_x, and c_y need to be scaled appropriately.

The functions below use the above model to do the following:

  • Project 3D points to the image plane given intrinsic and extrinsic parameters.
  • Compute extrinsic parameters given intrinsic parameters, a few 3D points, and their projections.
  • Estimate intrinsic and extrinsic camera parameters from several views of a known calibration pattern (every view is described by several 3D-2D point correspondences).
  • Estimate the relative position and orientation of the stereo camera “heads” and compute the rectification transformation that makes the camera optical axes parallel.

 

兩可困惑耶◎