Mastering Machine Learning with R(Second Edition)
上QQ阅读APP看书,第一时间看更新

Classification methods and linear regression

So, why can't we just use the least square regression method that we learned in the previous chapter for a qualitative outcome? Well, as it turns out, you can, but at your own risk. Let's assume for a second that you have an outcome that you are trying to predict and it has three different classes: mild, moderate, and severe. You and your colleagues also assume that the difference between mild and moderate and moderate and severe is an equivalent measure and a linear relationship. You can create a dummy variable where 0 is equal to mild, 1 is equal to moderate, and 2 is equal to severe. If you have reason to believe this, then linear regression might be an acceptable solution. However, qualitative assessments such as the previous ones might lend themselves to a high level of measurement error that can bias the OLS. In most business problems, there is no scientifically acceptable way to convert a qualitative response to one that is quantitative. What if you have a response with two outcomes, say fail and pass? Again, using the dummy variable approach, we could code the fail outcome as 0 and the pass outcome as 1. Using linear regression, we could build a model where the predicted value is the probability of an observation of pass or fail. However, the estimates of Y in the model will most likely exceed the probability constraints of [0,1] and thus be a bit difficult to interpret.