The Computer Vision Workshop
上QQ阅读APP看书,第一时间看更新

Introduction to OpenCV

OpenCV, also known as the Open Source Computer Vision library, is the most commonly used computer vision library. Primarily written in C++, it's also commonly used in Python, thanks to its Python wrappers. Over the years, OpenCV has been through multiple revisions and its current version is 4.2.0 (which is the version we are going to use in this book). What makes it different from other computer vision libraries is the fact that it's fast and easy to use, it provides support for libraries such as QT and OpenGL, and most importantly, it provides hardware acceleration for Intel processors. These powerful features/benefits make OpenCV the perfect choice for understanding the various concepts of computer vision and implementing them. Apart from OpenCV, we will also use NumPy for some basic computation and Matplotlib for visualization, wherever required.

Note

Refer to the Preface for NumPy and OpenCV installation instructions.

Let's start by understanding how images are represented in OpenCV in Python.

Images in OpenCV

OpenCV has its own class for representing images – cv::Mat. The "Mat" part comes from the term matrix. Now, this should not come as a surprise since images are nothing more than matrices. We already know that every image has three attributes specific to its dimensions – width, height, and the number of channels. We also know that every channel of an image is a collection of pixel values lying between 0 and 255. Notice how the channel of an image starts to look similar to a 2D matrix. So, an image becomes a collection of 2D matrices stacked on top of each other.

Refer to the following diagram for more details:

Figure 1.21: Image as 2D matrices stacked on top of each other

As a quick recap, while using OpenCV in Python, images are represented as NumPy arrays. NumPy is a Python module commonly used for numerical computation. A NumPy array looks like a 2D matrix, as we saw in Exercise 1.01, Creating NumPy Arrays. That's why an RGB image (which has three channels) will look like three 2D NumPy arrays stacked on top of each other.

We have restricted our discussion so far only to 2D arrays (which is good enough for grayscale images), but we know that our RGB images are not like 2D arrays. They not only have a height and a width; they also have one extra dimension – the number of channels in the image. That's why we can refer to RGB images as 3D arrays.

The only difference to the commands we discussed in the NumPy Arrays section is that we now have to add an extra dimension to the shape of the NumPy arrays – the number of channels. Since we know that RGB images have only three channels, the shape of the NumPy arrays becomes (number of rows, number of columns, 3).

Also, note that the order of elements in the shape of NumPy arrays follows this format: (number of rows, number of columns, 3). Here, the number of rows is equivalent to the height of the image, while the number of columns is equivalent to the width of the image. That's why the shape of the NumPy array can also be represented as (height, width, 3).

Now that we know about how images are represented in OpenCV, let's go ahead and learn about some functions in OpenCV that we will commonly use.

Important OpenCV Functions

We can pide the OpenCV functions that we are going to use into the following categories:

  • Reading an image
  • Modifying an image
  • Displaying an image
  • Saving an image

Let's start with the function required for reading an image. The only function we will use for this is cv2.imread. This function takes the following arguments:

  • File name of the image we want to read/load
  • Flags for specifying what mode we want to read the image in

If we try to load an image that does not exist, the function returns None. This can be used to check whether the image was read successfully or not.

Currently, OpenCV supports formats such as .bmp, .jpeg, .jpg, .png, .tiff, and .tif. For the entire list of formats, you can refer to the documentation: https://docs.opencv.org/4.2.0/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56.

The last thing that we need to focus on regarding the cv2.imread function is the flag. There are only three flags that are commonly used for reading an image in a specific mode:

  • cv2.IMREAD_UNCHANGED: Reading the image as it is. This means that if an image is a PNG image with a transparent background, then it will be read as a BGRA image, where A specifies the alpha channel – which is responsible for transparency. If this flag is not used, the image will be read as a BGR image. Note that BGR refers to the blue, green, and red channels of an image. A, or the alpha channel, is responsible for transparency. That's why an image with a transparent background will be read as BGRA and not as BGR. It's also important to note here that OpenCV, by default, uses BGR mode and that's why we are discussing BGRA mode and not RGBA mode here.
  • cv2.IMREAD_GRAYSCALE: Reading the image in grayscale format. This converts any color image into grayscale.
  • cv2.IMREAD_COLOR: This is the default flag and it reads any image as a color image (BGR mode).

    Note

    Note that OpenCV reads images in BGR mode rather than RGB mode. This means that the order of channels becomes blue, green, and red. Even with the other OpenCV functions that we will use, it is assumed that the image is in BGR mode.

Next, let's have a look at some functions we can use to modify an image. We will specifically discuss the functions for the following tasks:

  • Converting an image's color space
  • Splitting an image into various channels
  • Merging channels to form an image

Let's learn how we can convert the color space of an image. For this, we will use the cv2.cvtColor function. This function takes two inputs:

  • The image we want to convert
  • The color conversion flag, which looks as follows:

    cv2.COLOR_{CURRENT_COLOR_SPACE}2{NEW_COLOR_SPACE}

For example, to convert a BGR image into an HSV image, you will use cv2.COLOR_BGR2HSV. For converting a BGR image into grayscale, you will use cv2.COLOR_BGR2GRAY, and so on. You can view the entire list of such flags here: https://docs.opencv.org/4.2.0/d8/d01/group__imgproc__color__conversions.html.

Now, let's look at splitting and merging channels. Suppose you only want to modify the red channel of an image; you can first split the three channels (blue, green, and red), modify the red channel, and then merge the three channels again. Let's see how we can use OpenCV functions to split and merge channels:

  • For splitting the channels, we can use the cv2.split function. It takes only one argument – the image to be split – and returns the list of three channels – blue, green, and red.
  • For merging the channels, we can use the cv2.merge function. It takes only one argument – a set consisting of the three channels (blue, green, and red) – and returns the merged image.

Next, let's look at the functions we will use for displaying an image. There are three main functions that we will be using for display purposes:

  • To display an image, we will use the cv2.imshow function. It takes two arguments. The first argument is a string, which is the name of the window in which we are going to display the image. The second argument is the image that we want to display.
  • After the cv2.imshow function is called, we use the cv2.waitKey function. This function specifies how long the control should stay on the window. If you want to move to the next piece of code after the user presses a key, you can provide 0. Otherwise, you can provide a number that specifies the number of milliseconds the program will wait before moving to the next piece of code. For example, if you want to wait for 10 milliseconds before moving to the next piece of code, you can use cv2.waitKey(10).
  • Without calling the cv2.waitKey function, the display window won't be visible properly. But after moving to the next code, the window will still stay open (but will appear as if it's not responding). To close all the display windows, we can use the cv2.destroyAllWindows() function. It takes no arguments. It's recommended to close the display windows once they are no longer needed.

Finally, to save an image, we will use OpenCV's cv2.imwrite function. It takes two arguments:

  • A string that specifies the filename that we want to save the image with
  • The image that we want to save

Now that we know about the OpenCV functions that we are going to use in this chapter, let's get our hands dirty by using them in the next exercise.

Exercise 1.02: Reading, Processing, and Writing an Image

In this exercise, we will use the OpenCV functions that we looked at in the previous section to load the image of the lion in Figure 1.3, separate the red, green, and blue channels, display them, and finally save the three channels to disk.

Note

The image can be found at https://packt.live/2YOyQSv.

Follow these steps to complete this exercise:

  1. First of all, we will create a new notebook – Exercise1.02.ipynb. We will be writing our code in this notebook.
  2. Let's import the OpenCV module:

    import cv2

  3. Next, let's read the image of the lion and the girl. The image is present at the ../data/lion.jpg path:

    Note

    Before proceeding, ensure that you can change the path to the image (highlighted) based on where the image is saved in your system.

    # Load image

    img = cv2.imread("../data/lion.jpg")

    Note

    The # symbol in the preceding code snippet denotes a code comment. Comments are added into code to help explain specific bits of logic.

  4. We will check whether we have read the image successfully or not by checking whether it is None:

    if img is None:

        print("Image not found")

  5. Next, let's display the image we have just read:

    # Display the image

    cv2.imshow("Lion",img)

    cv2.waitKey(0)

    cv2.destroyAllWindows()

    The output is as follows:

    Note

    Please note that whenever we are going to display an image using the cv2.imshow function, a new display window will pop up. This output will not be visible in the Jupyter Notebook and will be displayed in a separate window, as shown in the following figure.

    Figure 1.22: Lion image

  6. Now comes the processing step, where we will split the image into the three channels – blue, green, and red:

    # Split channels

    blue, green, red = cv2.split(img)

  7. Next, we can display the channels that we obtained in the preceding step. Let's start by displaying the blue channel:

    cv2.imshow("Blue",blue)

    cv2.waitKey(0)

    cv2.destroyAllWindows()

    The output is as follows:

    Figure 1.23: Lion image (blue channel)

  8. Next, let's display the green channel:

    cv2.imshow("Green",green)

    cv2.waitKey(0)

    cv2.destroyAllWindows()

    The output is as follows:

    Figure 1.24: Lion image (green channel)

  9. Similarly, we can display the red channel of the image:

    cv2.imshow("Red",red)

    cv2.waitKey(0)

    cv2.destroyAllWindows()

    The output is as follows:

    Figure 1.25: Lion image (red channel)

  10. Finally, to save the three channels we obtained, we will use the cv2.imwrite function:

    cv2.imwrite("Blue.png",blue)

    cv2.imwrite("Green.png",green)

    cv2.imwrite("Red.png",red)

    This will return True. This indicates that the images have been successfully written/saved on the disk. At this point, you can verify whether the three channels you have obtained match the images shown here:

Figure 1.26: Blue, green, and red channels obtained using our code

Note

To access the source code for this specific section, please refer to https://packt.live/2YQlDbU.

In the next section, we will discuss another library that is commonly used in computer vision – Matplotlib.

Using Matplotlib to Display Images

Matplotlib is a library that is commonly used in data science and computer vision for visualization purposes. The beauty of this library lies in the fact that it's very powerful and still very easy to use, similar to the OpenCV library.

In this section, we will have a look at how we can use Matplotlib to display images that have been read or processed using OpenCV. The only point that you need to keep in mind is that Matplotlib assumes the images will be in RGB mode, whereas OpenCV assumes the images will be in BGR mode. That's why we will be converting the image to RGB mode whenever we want to display it using Matplotlib.

There are two common ways to convert a BGR image into an RGB image:

  • Using OpenCV's cv2.cvtColor function and passing the cv2.COLOR_BGR2RGB flag. Let's imagine that we have an image loaded as img that we want to convert into RGB mode from BGR mode. This can be done using cv2.cvtColor(img, cv2.COLOR_BGR2RGB).
  • The second method focuses on the fact that you are reversing the order of channels when you are converting a BGR image into an RGB image. This can be done by replacing img with img[:,:,::-1], where ::-1 in the last position is responsible for reversing the order of channels. We will be using this approach whenever we are displaying images using Matplotlib. The only reason behind doing this is that less time is required to write this code compared to option 1.

Now, let's have a look at the functions we are going to use to display images using Matplotlib.

First, we will import the matplotlib library, as follows. We will be using Matplotlib's pyplot module to create plots and display images:

import matplotlib.pyplot as plt

We will also be using the following magic command so that the images are displayed inside the notebook rather than in a new display window:

%matplotlib inline

Next, if we want to display a color image, we will use the following command, where we are also converting the image from BGR into RGB. We are using the same lion image as before and have loaded it as img:

plt.imshow(img[::-1])

Note

Execute this code in the same Jupyter Notebook where you executed Exercise 1.02, Reading, Processing, and Writing an Image.

The code is as follows:

plt.imshow(img[::-1])

Finally, to display the image, we will use the plt.show() command.

This will give us the following output:

Figure 1.27: Lion image

If we want to display a grayscale image, we will also have to specify the colormap as gray. This is simply because Matplotlib displays grayscale images with a colormap of jet. You can see the difference between the two colormaps in the following plot:

plt.imshow(img, cmap="gray")

The plot looks as follows:

Figure 1.28: (Left) Image without colormap specified, (right) image with a gray colormap

Note

It's very important to note that when we display an image using Matplotlib, it will be displayed inside the notebook, whereas an image displayed using OpenCV's cv2.imshow function will be displayed in a separate window. Moreover, an image displayed using Matplotlib will have the gridlines or X and Y axes, by default. The same will not be present in an image displayed using the cv2.imshow function. This is because Matplotlib is actually a (graph) plotting library, and that's why it displays the axes, whereas OpenCV is a computer vision library. The axes don't hold much importance in computer vision. Please note that irrespective of the presence or absence of the axes, the image graphic will stay the same, whether it's displayed using Matplotlib or OpenCV. That's why any and all image processing steps will also stay the same. We will be using Matplotlib and OpenCV interchangeably in this book to display images. This means that sometimes you will find images with axes and sometimes without axes. In both cases, the axes don't hold any importance and can be ignored.

That's all it takes to display an image using Matplotlib in a Jupyter Notebook. In the next section, we will cover the final topic of this chapter – how to access and manipulate the pixels of an image.

Accessing and Manipulating Pixels

So far, we have discussed how to use OpenCV to read and process an image. But the image processing guide that we have covered so far was very basic and only constituted splitting and merging the channels of an image. Now, let's learn how to access and manipulate pixels, the building blocks of an image.

We can access and manipulate pixels based on their location. We'll learn how we can use the pixel locations in this section.

We already have covered how pixels are located using the coordinate system of an image. We also know that images in OpenCV in Python are represented as NumPy arrays. That's why the problem of accessing pixels becomes the general problem of accessing the elements of a NumPy array.

Let's consider a NumPy array, A, with m rows and n columns. If we want to access the elements present in row number i and column number j, we can do that using A[i][j] or A[i,j].

Similarly, if we want to extract the elements of a NumPy array, A, within rows a and b and columns c and d, we can do that using A[a:b][c:d].

What if we wanted to extract the entire ith row of the array, A? We can do that using A[i][:], where : is used when we want to extract the entire range of elements in that list.

Similarly, if we want to extract the entire jth column, we can use A[:][j].

Manipulating pixels becomes very easy once you have managed to access the pixels you want. You can either change their values to a new value or copy the values from another pixel.

Let's learn how to use the preceding operations by completing a practical exercise.

Exercise 1.03: Creating a Water Effect

In this exercise, we will implement a water filter that is responsible for vertically flipping an object that is floating on a body of water. You can see this effect in the following image:

Figure 1.29: Water effect

The entire problem can be broken down into the following steps:

  1. Read an image.
  2. Flip the image vertically.
  3. Join the original image and the flipped image.
  4. Display and save the final image.

In this exercise, we will create a water effect using the concepts we have studied so far. We will be applying the same water effect to the lion.jpg image (Figure 1.3) we used earlier. Follow these steps to complete this exercise:

Note

The image can be found at https://packt.live/2YOyQSv.

  1. Import the required libraries – Matplotlib, NumPy, and OpenCV:

    import cv2

    import numpy as np

    import matplotlib.pyplot as plt

  2. We will also use the magic command to display images using Matplotlib in the notebook:

    %matplotlib inline

  3. Next, let's read the image and display it. The image is stored in the ../data/lion.jpg path:

    Note

    Before proceeding, ensure that you can change the path to the image (highlighted) based on where the image is saved in your system.

    # Read the image

    img = cv2.imread("../data/lion.jpg")

    # Display the image

    plt.imshow(img[:,:,::-1])

    plt.show()

    The output is as follows:

    Figure 1.30: Image output

  4. Let's find the shape of the image to understand what we are dealing with here:

    # Find the shape of the image

    img.shape

    The shape of the image is (407, 640, 3).

  5. Now comes the important part. We will have to create a new image with twice the number of rows (or twice the height) but the same number of columns (or width) and the same number of channels. This is because we want to add the mirrored image to the bottom of the image:

    # Create a new array with double the size

    # Height will become twice

    # Width and number of channels will

    # stay the same

    imgNew = np.zeros((814,640,3),dtype=np.uint8)

  6. Let's display this new image we created. It should be a completely black image at this point:

    # Display the image

    plt.imshow(imgNew[:,:,::-1])

    plt.show()

    The output is as follows:

    Figure 1.31: New black image that we have created using np.zeros

  7. Next, we will copy the original image to the top half of the image. The top half of the image corresponds to the first half of the rows of the new image:

    # Copy the original image to the

    # top half of the new image

    imgNew[:407][:] = img

  8. Let's look at the new image now:

    # Display the image

    plt.imshow(imgNew[:,:,::-1])

    plt.show()

    Here's the output of the show() method:

    Figure 1.32: Image after copying the top half of the image

  9. Next, let's vertically flip the original image. We can take some inspiration from how we reversed the channels using ::-1 in the last position. Since flipping the image vertically is equivalent to reversing the order of rows in the image, we will use ::-1 in the first position:

    # Invert the image

    imgInverted = img[::-1,:,:]

  10. Display the inverted image, as follows:

    # Display the image

    plt.imshow(imgInverted[:,:,::-1])

    plt.show()

    The inverted image looks as follows:

    Figure 1.33: Image obtained after vertically flipping the original image

  11. Now that we have the flipped image, all we have to do is copy this flipped image to the bottom half of the new image:

    # Copy the inverted image to the

    # bottom half of the new image

    imgNew[407:][:] = imgInverted

  12. Display the new image, as follows:

    # Display the image

    plt.imshow(imgNew[:,:,::-1])

    plt.show()

    The output is as follows:

    Figure 1.34: Water effect

  13. Let's save the image that we have just created:

    # Save the image

    cv2.imwrite("WaterEffect.png",imgNew)

In this exercise, you saw how the toughest-looking tasks can sometimes be completed using the very basics of a topic. Using our basic knowledge of NumPy arrays, we were able to generate a very beautiful-looking image.

Note

To access the source code for this specific section, please refer to https://packt.live/2VC7QDL.

Let's test what we have learned so far by completing the following activity, where we will create a mirror effect. One difference between the water effect and the mirror effect image will be that the mirror effect will be laterally inverted. Moreover, we will also be introducing an additional negative effect to the mirror image. This negative effect gets its name from the image negatives that are used while processing photographs. You can see the effect of the mirror image by looking at the following figure.

Let's test what we have learned so far with the following activity.

Activity 1.01: Mirror Effect with a Twist

Creating a very simple mirror effect is very simple, so let's bring a twist to this. We want to replicate the effect shown in the following figure. These effects are useful when we want to create apps such as Snapchat, Instagram, and so on. For example, the water effect, the mirror effect, and so on are quite commonly used as filters. We covered the water effect in the previous exercise. Now, we will create a mirror effect:

Figure 1.35: Mirror effect

Before you read the detailed instructions, think about how you would create such an effect. Notice the symmetry in the image, that is, the mirror effect. The most important part of this activity is to generate the image on the right. Let's learn how we can do that. We will be using the same image of the lion and girl that we used in the previous exercises.

Note

The image can be found at https://packt.live/2YOyQSv.

Follow these steps to complete this activity:

  1. First, load the required modules – OpenCV, Matplotlib, and NumPy.
  2. Next, write the magic command to display the images in the notebook.
  3. Now, load the image and display it using Matplotlib.
  4. Next, obtain the shape of the image.
  5. Now comes the most interesting part. Convert the image's color space from BGR into HSV and display the HSV image. The image will look as follows:

    Figure 1.36: Image converted into the HSV color space

  6. Next, extract the value channel from the HSV color space. Note that the value channel is the last channel of the HSV image. You can use the cv2.split function for this. Display the value channel. The image will look as follows:

    Figure 1.37: Value channel of the HSV image

  7. Now comes another interesting part. We will create a negative effect on the image. This is similar to what you see in the negatives of the images you click. To carry out this effect, all you have to do is subtract the value channel from 255. Then, display the new value channel. The image will look as follows:

    Figure 1.38: Negative of the value channel

  8. Next, create a new image by merging the value channel with itself. This can be done by using cv2.merge((value, value, value)), where value refers to the negative of the value channel you obtained in the preceding step. We are doing this because we want to merge two three-channel images to create the final effect.
  9. Next, flip the new image you obtained previously, horizontally. You can refer to the flipping step we did in Exercise 1.03, Creating a Water Effect. Note that flipping horizontally is equivalent to reversing the order of the columns of an image. The output will be as follows:

    Figure 1.39: Image flipped horizontally

  10. Now, create a new image with twice the width as the original image. This means that the number of rows and the number of channels will stay the same. Only the number of columns will be doubled.
  11. Now, copy the original image to the left half of the new image. The image will look as follows:

    Figure 1.40: Copying the original image to the left half

  12. Next, copy the horizontally flipped image to the right half of the new image and display the image.
  13. Finally, save the image you have obtained.

    The final image will look as follows:

Figure 1.41: Final image

Note

The solution to this activity can be found on page 466.

In this activity, you used the concepts you studied in the previous exercise to generate a really interesting result. By completing this activity, you have learned how to convert the color space of an image and use a specific channel of the new image to create a mirror effect.