Hands-On Python Deep Learning for the Web
上QQ阅读APP看书,第一时间看更新

Train, test, and validation sets

Any ML system is to be given data. Without data, it is practically impossible to design an ML system. We are not concerned about the quantity of the data as of now, but it is important to keep in mind that we need data to devise an ML system. Once we have that data, we use it for training our ML systems so that they can be used to predict something on the new data (something is a broad term here and it varies from problem to problem). So, the data that is used for training purposes is known as a train set and the data on which the systems are tested is known as a test set. Also, before actually employing the model on the test data, we tend to validate its performance on another set of data, which is called a validation set. Sometimes, we don't get the data in these nice partitions; we just get the data in a raw unfathomable format, which we further process and make these partitions with accordingly. 

Technically, all of the instances in these three different sets are supposed to vary from each other while the distribution in the data is supposed to be the same. Nowadays, many researchers have found critical issues regarding these assumptions and have come up with something called adversarial training, which is out of the scope of this book.