The XOR problem
To explain the importance of depth in an ANN, we will look at a very simple problem that an ANN is able to solve because it has more than one layer.
In the early days of working with artificial neurons, people did not cascade layers together like we do in ANNs, so we ended up with a single layer that was named a perceptron:
The perceptron is effectively just a dot product between an input and a set of learned weights, which means that it is actually just a linear classifier.
It was around the time of the first AI winter that people realized the weaknesses of the perceptron. As it is just a linear classifier, it is not able to solve simple nonlinear classification problems such as the Boolean exclusive-or (XOR) problem. To solve this issue, we needed to go deeper.
In this image, we see some different boolean logic problems. A linear classifier can solve the AND and OR problems but is not able to solve the XOR:
This led people to have the idea of cascading together layers of neurons that use nonlinear activations. A layer could create nonlinear concepts based on the output of the previous layer. This “composition of concepts” allows networks to be more powerful and to represent more difficult functions, and consequently, they are able to tackle nonlinear classification problems.