The Computer Vision Workshop
上QQ阅读APP看书,第一时间看更新

Introduction

Welcome to the world of images. It's interesting, it's wide, and most importantly, it's colorful. The world of artificial intelligence (AI) is impacting how we, as humans, can use the power of smart computers to perform tasks much faster, more efficiently, and with minimal effort. The idea of imparting human-like intelligence to computers (known as AI) is a really interesting concept. When the intelligence is focused on images and videos, the domain is referred to as computer vision. Similarly, natural language processing (NLP) is the AI stream where we try to understand the meaning behind the text. This technology is used by major companies for building AI-based chatbots designed to interact with customers. Both computer vision and NLP share the concepts of deep learning, where we use deep neural networks to complete tasks such as object detection, image classification, word embedding, and more. Coming back to the topic of computer vision, companies have come up with interesting use cases where AI systems have managed to change entire scenarios. Google, for example, came up with the idea of Google Goggles, which can perform several operations, such as image classification, object recognition, and more. Similarly, Tesla's self-driving cars use computer vision extensively to detect pedestrians and vehicles on the road and to detect the lane on which the car is moving.

This book will serve as a journey through the interesting components of computer vision. We will start by understanding images and then go over how they can be processed. After a couple of chapters, we will jump into detailed topics such as histograms and contours and finally go over some real-life applications of computer vision – face processing, object detection, object tracking, 3D reconstruction, and so on. This is going to be a long journey, but we will get through it together.

We love looking at high-resolution color photographs, thanks to the gamut of colors they offer. Not so long ago, however, we had photos printed only in black and white. However, those "black-and-white" photos also had some color in them, the only difference being that the colors were all shades of gray. The common thing that's there in all these components is the vision part. That's where computer vision gets its name. Computer refers to the fact that it's the computer that is processing the visual data, while vision refers to the fact that we are dealing with visual data – images and videos.

An image is made up of smaller components called pixels. A video is made up of multiple frames, each of which is nothing but an image. The following diagram gives us an idea of the various components of videos, images, and pixels:

Figure 1.1: Relationships between videos, images, and pixels

Figure 1.1: Relationships between videos, images, and pixels

In this chapter, we will focus only on images and pixels. We will also go through an introduction to the OpenCV library, along with the functions present in the library that are commonly used for basic image processing. Before we jump into the details of images and pixels, let's get through the prerequisites, starting with NumPy arrays. The reason behind this is that images in OpenCV are nothing but NumPy arrays. Just as a quick recap, NumPy is a Python module that's used for numerical computations and is well known for its high-speed computations.