Mastering Java for Data Science
上QQ阅读APP看书,第一时间看更新

Dimensionality reduction

Another group of unsupervised learning algorithms is dimensionality reduction algorithms. This group of algorithms compresses the dataset, keeping only the most useful information. If our dataset has too much information, it can be hard for a machine learning algorithm to use all of it at the same time. It may just take too long for the algorithm to process all the data and we would like to compress the data, so processing it takes less time. 

There are multiple algorithms that can reduce the dimensionality of the data, including Principal Component Analysis (PCA), Locally linear embedding, and t-SNE. All these algorithms are examples of unsupervised dimensionality reduction techniques.

Not all dimensionality reduction algorithms are unsupervised; some of them can use labels to reduce the dimensionality better. For example, many feature selection algorithms rely on labels to see what features are useful and what are not. 

We will talk more about this in Chapter 5Unsupervised Learning - Clustering and Dimensionality Reduction.