Introduction
Regression is one of the oldest and yet quite powerful tools for mathematical modelling, classification, and prediction. Regression finds application in varied fields ranging from engineering, physical science, biology, and the financial market to social sciences. It is the basic tool in the hand of a data scientist.
Regression is normally the first algorithm that people in machine learning work with. It allows us to make predictions from data by learning the relationship between the dependent and independent variables. For example, in the case of house price estimation, we determine the relationship between the area of the house (independent variable) and its price (dependent variable); this relationship can be then used to predict the price of any house given its area. We can have multiple independent variables impacting the dependent variable. Thus, there are two important components of regression: the relationship between independent and dependent variables, and the strength of impact of the different independent variable on the dependent variable.
There are various types of regression methods available:
- Linear regression: This is one of the most widely used modelling technique. Existing for more than 200 years, it has been explored from almost all possible angles. Linear regression assumes a linear relationship between the input variables (X) and the single output variable (Y). It involves finding a linear equation for predicted value Y of the form:
Here, X = (x1, x2, ..., xn) are the n input variables and W = (w1, w2, ...wn) are the linear coefficients, with b as the bias term. The goal is to find the best estimates for the coefficients W, such that the error in the predicted Y is minimized. The linear coefficients Ws are estimated using the method of least squares, that is, minimizing the sum of squared differences between predicted values (Yhat) and observed values (Y). Thus, we try to minimize the loss function:
Here, the sum is over all the training samples. Depending on the number and type of input variable X, there are different types of linear regression: simple linear regression (one input variable, one output variable), multiple linear regression (many independent input variables, one output variable), or multivariate linear regression (many independent input variables and multiple output variables). For more on linear regression, you can refer to https://en.wikipedia.org/wiki/Linear_regression.
- Logistic regression: This is used to determine the probability of an event. Conventionally, the event is represented as a categorical dependent variable. The probability of the event is expressed using the logit function (sigmoid function):
The goal now is to estimate weights W = ( w1, w2, ...wn) and bias term b. In logistic regression, the coefficients are estimated using either maximum likelihood estimator or stochastic gradient descent. The loss is conventionally defined as a cross-entropy term given as follows:
Logistic regression is used in classification problems, for example, given medical data, we can use logistic regression to classify whether a person has cancer or not. In case the output categorical variable has two or more levels, we can use multinomial logistic regression. Another common technique used for two or more output variables is one versus all. For multiclass logistic regression, the cross-entropy loss function is modified as follows:
Here, K is the total number of classes. More about logistic regression can be read at https://en.wikipedia.org/wiki/Logistic_regression.
These are the two popularly used regression techniques.
- Regularization: When there are a large number of input features, then regularization is needed to ensure that the predicted model is not complex. Regularization can help in preventing overfitting of data. It can also be used to obtain a convex loss function. There are two types of regularization, L1 and L2 regularization, which are described in the following points:
-
- L1 regularization can also work when the data is highly collinear. In L1 regularization, an additional penalty term dependent on the absolute sum of all the coefficients is added to the loss function. The regularization penalty term for L1 regularization is as follows:
-
L2 Regularization provides sparse solutions. It is very useful when the number of input features is extremely large. In this case, the penalty term is the sum of the square of all the coefficients:
Above the Greek letter, lambda (λ) is the regularization parameter.