【鼎革‧革鼎】︰ Raspbian Stretch 《六之 J.3‧MIR-13.0 》


Linear classifier

In the field of machine learning, the goal of statistical classification is to use an object’s characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object’s characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.[1]


If the input feature vector to the classifier is a real vector \vec x, then the output score is

y = f(\vec{w}\cdot\vec{x}) = f\left(\sum_j w_j x_j\right),

where \vec w is a real vector of weights and f is a function that converts the dot product of the two vectors into the desired output. (In other words, \vec{w} is a one-form or linear functional mapping \vec x onto R.) The weight vector \vec w is learned from a set of labeled training samples. Often f is a simple function that maps all values above a certain threshold to the first class and all other values to the second class. A more complex f might give the probability that an item belongs to a certain class.

For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as “yes“, while the others are classified as “no“.

A linear classifier is often used in situations where the speed of classification is an issue, since it is often the fastest classifier, especially when \vec x is sparse. Also, linear classifiers often work very well when the number of dimensions in \vec x is large, as in document classification, where each element in \vec x is typically the number of occurrences of a word in a document (see document-term matrix). In such cases, the classifier should be well-regularized.


In this case, the solid and empty dots can be correctly classified by any number of linear classifiers. H1 (blue) classifies them correctly, as does H2 (red). H2 could be considered “better” in the sense that it is also furthest from both groups. H3 (green) fails to correctly classify the dots.




如果請人『分辨』下圖『什麼是什麼』 ?








4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8





─── 《神經網絡【Perceptron】五


如果反思使用『特徵向量』表現某物 \vec{x} 的作法,其實十分『抽象』!就像為什麼 28 \times 28 二維圖素的手寫數字,可以用 784 個一維分量來代表那個『輸入特徵』呢?這樣兩個手寫數字之『不同』到底在計算什麼呢??好比


餘弦相似性通過測量兩個向量的夾角的餘弦值來度量它們之間的相似性。0度角的餘弦值是1,而其他任何角度的餘弦值都不大於1;並且其最小值是-1。從而兩個向量之間的角度的餘弦值確定兩個向量是否大致指向相同的方向。兩個向量有相同的指向時,餘弦相似度的值為1;兩個向量夾角為90°時,餘弦相似度的值為0;兩個向量指向完全相反的方向時,餘弦相似度的值為-1。這結果是與向量的長度無關的 ,僅僅與向量的指向方向相關。餘弦相似度通常用於正空間,因此給出的值為0到1之間。


另外,它通常用於文本挖掘中的文件比較。此外,在數據挖掘領域中 ,會用到它來度量集群內部的凝聚力。[1]



  {\mathbf {a}}\cdot {\mathbf {b}}=\left\|{\mathbf {a}}\right\|\left\|{\mathbf {b}}\right\|\cos \theta

給定兩個屬性向量, AB,其餘弦相似性θ由點積和向量長度給出,如下所示:

{\text{similarity}}=\cos(\theta )={A\cdot B \over \|A\|\|B\|}={\frac {\sum \limits _{{i=1}}^{{n}}{A_{i}\times B_{i}}}{{\sqrt {\sum \limits _{{i=1}}^{{n}}{(A_{i})^{2}}}}\times {\sqrt {\sum \limits _{{i=1}}^{{n}}{(B_{i})^{2}}}}}},這裡的  A_{i}  B_{i}分別代表向量  A  B的各分量


對於文本匹配,屬性向量AB 通常是文檔中的詞頻向量。餘弦相似性,可以被看作是在比較過程中把文件長度正規化的方法。





 In this exercise notebook, we will segment, feature extract, and analyze audio files. Goals:
  1. Detect onsets in an audio signal.
  2. Segment the audio signal at each onset.
  3. Compute features for each segment.
  4. Gain intuition into the features by listening to each segment separately.