Mastering Machine Learning Algorithms
上QQ阅读APP看书,第一时间看更新

Ridge

Ridge regularization (also known as Tikhonov regularization) is based on the squared L2-norm of the parameter vector:

This penalty avoids an infinite growth of the parameters (for this reason, it's also known as weight shrinkage), and it's particularly useful when the model is ill-conditioned, or there is multicollinearity, due to the fact that the samples are completely independent (a relatively common condition).

In the following diagram, we see a schematic representation of the Ridge regularization in a bidimensional scenario:

 

Ridge (L2) regularization

The zero-centered circle represents the Ridge boundary, while the shaded surface is the original cost function. Without regularization, the minimum (w1, w2) has a magnitude (for example, the distance from the origin) which is about double the one obtained by applying a Ridge constraint, confirming the expected shrinkage. When applied to regressions solved with the Ordinary Least Squares (OLS) algorithm, it's possible to prove that there always exists a Ridge coefficient, so that the weights are shrunk with respect the OLS ones. The same result, with some restrictions, can be extended to other cost functions.