為了驗證派生三工作環境正常,所以跑了 Aurélien Géron 第一章之 jupyter-notebook 筆記本
handson-ml/01_the_machine_learning_landscape.ipynb
畢竟自己也是 scikit-learn 之新手,只覺得查來找去『程式庫呼叫』的說明也不是學習的好辦法!故而很想要知道 API 設計藍圖呀?
偶而讀到
API design for machine learning software: experiences from the scikit-learn project
Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.
Submission history
From: Gael Varoquaux [view email]
[v1] Sun, 1 Sep 2013 16:22:48 UTC (28 KB)
欣喜天緣湊巧呦!!
……
………
一時勇猛精進,彷彿可以鳥瞰文件結構哩??
Identifying to which category an object belongs to.
Applications: Spam detection, Image recognition.
Algorithms:SVM, nearest neighbors, random forest, …Examples
Predicting a continuous-valued attribute associated with an object.
Automatic grouping of similar objects into sets.
Applications: Customer segmentation, Grouping experiment outcomes
Algorithms:k-Means, spectral clustering,mean-shift, …Examples
Reducing the number of random variables to consider.
Applications: Visualization, Increased efficiency
Algorithms:PCA, feature selection, non-negative matrix factorization.Examples
Comparing, validating and choosing parameters and models.
Feature extraction and normalization.
Application: Transforming input data such as text for use with machine learning algorithms.
Modules:preprocessing, feature extraction.Examples
…
1.1. Generalized Linear Models
The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if is the predicted value.
coef_
and as intercept_
.To perform classification with generalized linear models, see Logistic regression.
1.1.1. Ordinary Least Squares
LinearRegression
fits a linear model with coefficients to minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. Mathematically it solves a problem of the form:
LinearRegression
will take in its fit
method arrays X, y and will store the coefficients w of the linear model in its coef_
member:
>>> from sklearn import linear_model >>> reg = linear_model.LinearRegression() >>> reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) ... LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False) >>> reg.coef_ array([0.5, 0.5])
However, coefficient estimates for Ordinary Least Squares rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the design matrix becomes close to singular and as a result, the least-squares estimate becomes highly sensitive to random errors in the observed response, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design.
Examples:
……
API Reference
This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses. For reference on concepts repeated across the API, see Glossary of Common Terms and API Elements.
………
sklearn.linear_model
.LinearRegression
- class
sklearn.linear_model.
LinearRegression
(fit_intercept=True, normalize=False, copy_X=True,n_jobs=None) - Ordinary least squares Linear Regression.
Parameters: - fit_intercept : boolean, optional, default True
-
whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).
- normalize : boolean, optional, default False
-
This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
. - copy_X : boolean, optional, default True
-
If True, X will be copied; else, it may be overwritten.
- n_jobs : int or None, optional (default=None)
-
The number of jobs to use for the computation. This will only provide speedup for n_targets > 1 and sufficient large problems.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details.
Attributes: - coef_ : array, shape (n_features, ) or (n_targets, n_features)
-
Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
- intercept_ : array
-
Independent term in the linear model.
Notes
From the implementation point of view, this is just plain Ordinary Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.
因此樂與同好者分享也。