Hands-On Convolutional Neural Networks with TensorFlow
上QQ阅读APP看书,第一时间看更新

Calculating the receptive field

The receptive field is how much a particular convolution window "sees" of its input tensor.

Sometimes, it might be useful to know exactly how much each pixel in the activation from a particular layer "sees" in the input image; this is particularly important in object detection systems because we need to somehow see how some layers activations map back to the original image size.

In the following image we can see that the receptive field of a three sequential 3x3 convolution layers is the same as one 7x7 convolution layer. This property was important when designing new and better CNN models as we will see in later chapters.

The receptive field can be calculated as:

Here, the components are as follows:

  • : Receptive field of layer k
  • : Kernel size at layer k
  • : Strides from layer i (1..k-1)
  • : Product of all strides up to the layer k-1 (all previous layers and not the current one)

For the first layer only, the receptive field is just the kernel size.

Those calculations are independent on if we are using convolution or pooling layer, for example, a conv layer with stride 2 will have the same receptive field as a pooling layer with stride 2.

For instance, given a 14x14x3 image after the following layers, this will apply:

  • CONV: S:1, P:0, K:3
  • CONV: S:1, P:0, K:3
  • Max pool: S:2, P:0, K2
  • CONV: S:1, P:0, K:3