上QQ阅读APP看书，第一时间看更新

Label propagation

Label propagation is a family of semi-supervised algorithms based on a graph representation of the dataset. In particular, if we have N labeled points (with bipolar labels +1 and -1) and M unlabeled points (denoted by y=0), it's possible to build an undirected graph based on a measure of geometric affinity among samples. If G = {V, E} is the formal definition of the graph, the set of vertices is made up of sample labels V = { -1, +1, 0 }, while the edge set is based on an affinity matrix W (often called adjacency matrix when the graph is unweighted), which depends only on the X values, not on the labels.

In the following graph, there's an example of such a structure:

Example of binary graph

In the preceding example graph, there are four labeled points (two with y=+1 and two with y=-1), and two unlabeled points (y=0). The affinity matrix is normally symmetric and square with dimensions equal to (N+M) x (N+M). It can be obtained with different approaches. The most common ones, also adopted by Scikit-Learn, are:

k-Nearest Neighbors (we are going to discuss this algorithm with further details in Chapter 8, Clustering Algorithms):

Radial basis function kernel:

Sometimes, in the radial basis function kernel, the parameter γ is represented as the reciprocal of 2σ²; however, small γ values corresponding to a large variance increase the radius, including farther points and smoothing the class over a number of samples, while large γ values restrict the boundaries to a subset that tends to a single sample. Instead, in the k-nearest neighbors kernel, the parameter k controls the number of samples to consider as neighbors.

To describe the basic algorithm, we also need to introduce the degree matrix (D):

It is a diagonal matrix where each non-null element represents the degree of the corresponding vertex. This can be the number of incoming edges, or a measure proportional to it (as in the case of W based on the radial basis function). The general idea of label propagation is to let each node propagate its label to its neighbors and iterate the procedure until convergence.

Formally, if we have a dataset containing both labeled and unlabeled samples:

The complete steps of the label propagation algorithm (as proposed by Zhu and Ghahramani in Learning from Labeled and Unlabeled Data with Label Propagation, Zhu X., Ghahramani Z., CMU-CALD-02-107) are:

Select an affinity matrix type (KNN or RBF) and compute W
Compute the degree matrix D
Define Y⁽⁰⁾ = Y
Define Y_L = {y₀, y₁, ..., y_N}
Iterate until convergence of the following steps:

The first update performs a propagation step with both labeled and unlabeled points. Each label is spread from a node through its outgoing edges, and the corresponding weight, normalized with the degree, increases or decreases the effect of each contribution. The second command instead resets all y values for the labeled samples. The final labels can be obtained as:

The proof of convergence is very easy. If we partition the matrix D^-1W according to the relationship among labeled and unlabeled samples, we get:

If we consider that only the first N components of Y are non-null and they are clamped at the end of each iteration, the matrix can be rewritten as:

We are interested in proving the convergence for the part regarding the unlabeled samples (the labeled ones are fixed), so we can write the update rule as:

Transforming the recursion into an iterative process, the previous formula becomes:

In the previous expression, the second term is null, so we need to prove that the first term converges; however, it's easy to recognize a truncated matrix geometrical series (Neumann series), and A_UU is constructed to have all eigenvalues |λ_i| < 1, therefore the series converges to: