
上QQ阅读APP看书,第一时间看更新
Getting ready
The act of scaling data is extremely useful. There are a lot of machine learning algorithms, which perform differently (and incorrectly) in the event the features exist at different scales. For example, SVMs perform poorly if the data isn't scaled because they use a distance function in their optimization, which is biased if one feature varies from 0 to 10,000 and the other varies from 0 to 1.
The preprocessing module contains several useful functions for scaling features:
from sklearn import preprocessing
import numpy as np # we'll need it later
Load the Boston dataset:
from sklearn.datasets import load_boston
boston = load_boston()
X,y = boston.data, boston.target