Introduction to PyTorch
PyTorch is an open source library developed mainly by Facebook's artificial intelligence research group as a Python version of Torch.
Note
Torch is an open source, scientific computing framework that supports a wide variety of machine learning algorithms.
PyTorch was first released to the public in January 2017. It uses the power of GPUs to speed up the computation of tensors, which accelerates the training times of complex models.
The library has a C++ backend, combined with the deep learning framework of Torch, which allows much faster computations than native Python libraries with many deep learning features. The frontend is in Python, which has helped it gain popularity, enabling data scientists new to the library to construct complex neural networks. It is possible to use PyTorch alongside other popular Python packages.
Although the PyTorch is fairly new, it has gained popularity quickly as it was developed using feedback from many experts in the field. This has led PyTorch to become a useful library for users.
GPUs in PyTorch
GPUs were originally developed to speed up computations in graphics rendering, especially for video games and such. However, they have become increasingly popular lately thanks to their ability to help speed up computations for any field, including deep learning calculations.
There are several platforms that allow the allocation of variables to the GPUs of a machine, with the Compute Unified Device Architecture (CUDA) being one of the most commonly used platforms. CUDA is a computing platform developed by Nvidia that speeds up compute-intensive programs thanks to the use of GPUs to perform computations.
In PyTorch, the allocation of variables to CUDA can be done through the use of the torch.cuda package, as shown in the following code snippet:
x = torch.Tensor(10).random_(0, 10)
x.to("cuda")
Here, the first line of code creates a tensor filled with random integers (between 0 and 10). The second line of code allocates that tensor to CUDA so that all computations involving that tensor are handled by the GPU instead of the CPU. To allocate a variable back to the CPU, use the following code snippet:
x.to("cpu")
In CUDA, when solving a deep learning data problem, it is good practice to allocate the model holding the network architecture, as well as the input data. This will ensure that all computations carried out during the training process are handled by the GPU.
Nevertheless, this allocation can only be done given that your machine has a GPU available and that you have installed PyTorch with the CUDA package. To verify whether you are able to allocate your variables in CUDA, use the following code snippet:
torch.cuda.is_available()
If the output from the preceding line of code is True, you are all set to start allocating your variables in CUDA.
Note
To install PyTorch along with the CUDA package, visit PyTorch's website and make sure you select an option that includes CUDA (either version): https://pytorch.org/get-started/locally/.
What Are Tensors?
Similar to NumPy, PyTorch uses tensors to represent data. Tensors are matrix-like structures of n dimensions with the difference being that PyTorch tensors can run on the GPU (while NumPy tensors cannot), which helps to accelerate numerical computations. For tensors, dimensions are also known as ranks. The following diagram shows a visual representation of tensors of different dimensions:
In contrast to a matrix, a tensor is a mathematical entity contained in a structure that can interact with other mathematical entities. When one tensor transforms another, the former also carries a transformation of its own.
This means that tensors are not just data structures, but rather containers that, when fed some data, can map in a multi-linear manner with other tensors.
Similar to NumPy arrays or any other matrix-like structure, PyTorch tensors can have as many dimensions as desired. Defining a one-dimensional tensor (tensor_1) and a two-dimensional tensor (tensor_2) in PyTorch can be achieved using the following code snippet:
te nsor_1 = torch.tensor([1,1,0,2])
tensor_2 = torch.tensor([[0,0,2,1,2],[1,0,2,2,0]])
Note that the numbers in the preceding code snippet do not have a meaning. What matters is the definition of the different dimensions, which are filled with random numbers. From the preceding snippet, the first tensor would have a size of 4 for one dimension, while the second one would have a size of 5 for each of the two dimensions, which can be verified by making use of the shape property over the tensor variables, as seen here:
tensor_1.shape
The output is torch.Size([4]).
tensor_2.shape
The output is torch.Size([2],[5]).
When using a GPU-enabled machine, the following modification is implemented to define a tensor:
tensor = torch.tensor([1,1,0,2]).cuda()
Creating dummy data using PyTorch tensors is fairly simple, similar to what you would do in NumPy. For instance, torch.randn() returns a tensor filled with random numbers of the dimensions specified within the parentheses, while torch.randint() returns a tensor filled with integers (the minimum and maximum values can be defined) of the dimensions defined within the parentheses:
Note
The code snippet shown here uses a backslash ( \ ) to split the logic across multiple lines. When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.
example_1 = torch.randn(3,3)
example_2 = torch.randint(low=0, high=2, \
size=(3,3)).type(torch.FloatTensor)
As can be seen, example_1 is a two-dimensional tensor filled with random numbers, with each dimension of size equal to 3, while example_2 is a two-dimensional tensor filled with 0s and 1s (the high parameter is upper-bound exclusive), with each dimension's size equal to 3.
Any tensor filled with integers must be converted into floats so that we can feed it to any PyTorch model.
Exercise 1.01: Creating Tensors of Different Ranks Using PyTorch
In this exercise, we will use the PyTorch library to create tensors of ranks one, two, and three. Perform the following steps to complete this exercise:
Note
For the exercises and activities in this chapter, you will need to have Python 3.7, Jupyter 6.0, Matplotlib 3.1, and PyTorch 1.3+ (preferably PyTorch 1.4, with or without CUDA) installed (as instructed in the Preface). They will be primarily developed in a Jupyter Notebook and it is recommended that you keep a separate notebook for different assignments unless advised not to.
- Import the PyTorch library called torch:
import torch
- Create tensors of the following ranks: 1, 2, and 3.
Use values between 0 and 1 to fill your tensors. The size of the tensors can be defined as you wish, given that the ranks are created correctly:
tensor_1 = torch.tensor([0.1,1,0.9,0.7,0.3])
tensor_2 = torch.tensor([[0,0.2,0.4,0.6],[1,0.8,0.6,0.4]])
tensor_3 = torch.tensor([[[0.3,0.6],[1,0]], \
[[0.3,0.6],[0,1]]])
If your machine has a GPU available, you can create equivalent tensors using the GPU syntax:
tensor_1 = torch.tensor([0.1,1,0.9,0.7,0.3]).c uda()
tensor_2 = torch.tensor([[0,0.2,0.4,0.6], \
[1,0.8,0.6,0.4]]).cuda()
tensor_3 = torch.tensor([[[0.3,0.6],[1,0]], \
[[0.3,0.6],[0,1]]]).cuda()
- Print the shape of each of the tensors using the shape property, just as you would do with NumPy arrays:
print(tensor_1.shape)
print(tensor_2.shape)
print(tensor_3.shape)
The output of the print statements should look as follows, considering that the size of each dimension of the tensors may vary according to your choices:
torch.Size([5])
torch.Size([2, 4])
torch.Size([2, 2, 2])
Note
To access the source code for this specific section, please refer to https://packt.live/3dOS66H.
You can also run this example online at https://packt.live/2VwTLHq. You must execute the entire Notebook in order to get the desired result.
To access the GPU version of this source code, please refer to https://packt.live/31AwIzo. This version of the source code is not available as an online interactive example, and will need to be run locally with the GPU setup.
You have successfully created tensors of different ranks.
In the next section, we will discuss the advantages and disadvantages of using PyTorch.
Advantages of Using PyTorch
There are several libraries nowadays that can be used to develop deep learning solutions, so why use PyTorch? The answer is that PyTorch is a dynamic library that allows its users great flexibility to develop complex architectures that can be adapted to a particular data problem.
PyTorch has been adopted by many researchers and artificial intelligence developers, which makes it an important tool to have in a machine learning engineers toolkit.
The key aspects to highlight are as follows:
- Ea se of use: With respect to the API, PyTorch has a simple interface that makes it easy to develop and run models. Many early adopters consider it to be more intuitive than other libraries, such as TensorFlow.
- Speed: The use of GPUs enables the library to train faster than other deep learning libraries. This is especially useful when different approximations have to be tested in order to achieve the best possible model. Additionally, even though other libraries may also have the option to accelerate computations with GPUs, you can do this in PyTorch by typing just a couple of simple lines of code.
- Convenience: PyTorch is flexible. It uses dynamic computational graphs that allow you to make changes to networks on the go. It also allows great flexibility when building the architecture as it is easy to make adjustments to conventional architectures.
- Imperative: PyTorch is also imperative. Each line of code is executed inpidually, allowing you to track the model in real time, as well as debug the model in a convenient way.
- Pretrained models: Finally, it contains many pretrained models that are easy to use and are a great starting point for some data problems.
Disadvantages of Using PyTorch
Although the advantages are huge and many, there are still some disadvantages to consider, which are explained here:
- Small community: The community of adapters of this library is small in comparison to other libraries, such as TensorFlow. However, having been available to the public for only 3 years, today, it is among the list of the top five most popular libraries for implementing deep learning solutions, and its community is growing by the day.
- Spotty documentation: Considering that the library is fairly new in comparison to other deep learning libraries, the documentation is not as complete. However, since the features and capabilities of the library are increasing, the documentation is being extended. Additionally, as the community continues to grow, there will be more information available on the internet.
- Questions around production-readiness: Although many of the complaints about the library have focused on its inability to be deployed for production, after the launch of version 1.0, the library has included production capabilities to be able to export finalized models and use them in production environments.
Key Elements of PyTorch
Like any other library, PyTorch has a variety of modules, libraries, and packages for developing different functionalities. In this section, the three most commonly used elements for building deep neural networks will be explained, along with a simple example of the syntax.
The PyTorch autograd Library
The autograd library consists of a technique called automatic differentiation. Its purpose is to numerically calculate the derivative of a function. This is crucial for a concept we will learn about in the next chapter called backward propagation, which is carried out while training a neural network.
The derivative (also known as the gradient) of an element refers to the rate of change of that element in a given time step. In deep learning, gradients refer to the dimension and magnitude in which the parameters of the neural network must be updated in a training step in order to minimize the loss function. This concept will be further explored in the following chapter.
Note
A detailed explanation of neural networks and the different steps taken to train a model will be given in subsequent sections.
To compute the gradients, simply call the backward() function, as shown here:
a = torch.tensor([5.0, 3.0], requires_grad=True)
b = torch.tensor([1.0, 4.0])
ab = ((a + b) ** 2).sum()
ab.backward()
In the preceding code, two tensors were created. We use the requires_grad argument here to tell PyTorch to calculate the gradients of that tensor. However, when building your neural network, this argument is not required.
Next, a function was defined using the values of both tensors. Finally, the backward() function was used to calculate the gradients.
By printing the gradients for both a and b, it is possible to confirm that they were only calculated for the first variable (a), while for the second one (b), it throws an error:
print(a.grad.data)
The output is tensor([12., 14.]).
print(b.grad.data)
The output is as follows:
AttributeError: 'NoneType' object has no attribute 'data'
The PyTorch nn Module
The autograd library alone can be used to build simple neural networks, considering that the trickier part (the calculation of gradients) has been taken care of. However, this methodology can be troublesome, hence the introduction of the nn module.
The nn module is a complete PyTorch module used to create and train neural networks, which, through the use of different elements, allows for simple and complex developments. For instance, the Sequential() container allows for the easy creation of network architectures that follow a sequence of predefined modules (or layers) without the need for much knowledge of defining network architectures.
Note
The different layers that can be used for each neural network architecture will be explained further in subsequent chapters.
This module also has the capability to define the loss function to evaluate the model and many more advanced features that will be discussed in this book.
The process of building a neural network architecture as a sequence of predefined modules can be achieved in just a couple of lines, as shown here:
import torch.nn as nn
model = nn.Sequential(nn.Linear(input_units, hidden_units), \
nn.ReLU(), \
nn.Linear(hidden_units, output_units), \
nn.Sigmoid())
loss_funct = nn.MSELoss()
First, the module is imported. And then, the model architecture is defined. input_units refers to the number of features that the input data contains, hidden_units refers to the number of nodes of the hidden layer, and output_units refers to the number of nodes of the output layer.
As can be seen in the preceding code, the architecture of the network contains one hidden layer, followed by a ReLU activation function and an output layer, followed by a sigmoid activation function, making it a two-layer network.
Finally, the loss function is defined as the Mean Squared Error (MSE).
Note
The most popular loss functions for different data problems will be explained throughout this book.
To create models that do not follow a sequence of existing modules, custom nn modules are used. We'll introduce these later in this book.
Exercise 1.02: Defining a Single-Layer Architecture
In this exercise, we will use PyTorch's nn module to define a model for a single-layer neural network, and also define the loss function to evaluate the model. This will be the starting point so that you will be able to build more complex network architectures to solve real-life data problems. Perform the following steps to complete this exercise:
- Import torch as well as the nn module from PyTorch:
import torch
import torch.nn as nn
Note
torch.manual_seed(0) is being used in this exercise in order to ensure the reproducibility of the results that were obtained in this book's GitHub repository. However, when training a network for other purposes, a seed must not be defined.
To learn more about seed in PyTorch, visit https://pytorch.org/docs/stable/notes/randomness.html.
- Define the number of features of the input data as 10 (input_units) and the number of nodes of the output layer as 1 (output_units):
input_units = 10
output_units = 1
- Using the Sequential() container, define a single-layer network architecture and store it in a variable named model. Make sure to define one layer, followed by a Sigmoid activation function:
mo del = nn.Sequential(nn.Linear(input_units, output_units), \
nn.Sigmoid())
- Print your model to verify that it was created accordingly:
pr int(model)
Ru nning the preceding code snippet will display the following output:
Sequential(
(0): Linear(in_features=10, out_features=1, bias=True)
(1): Sigmoid()
)
- D efine the loss function as the MSE and store it in a variable named loss_funct:
loss_funct = nn.MSELoss()
- Print your loss function to verify that it was created accordingly:
print(loss_funct)
Running the preceding code snippet will display the following output:
MSELoss()
Note
To access the source code for this specific section, please refer to https://packt.live/2YNwyTy.
You can also run this example online at https://packt.live/2YOVPws. You must execute the entire Notebook in order to get the desired result.
You have successfully defined a single-layer network architecture.
The PyTorch optim Package
The optim package is used to define the optimizer that will be used to update the parameters in each iteration (which will be further explained in the following chapters) using the gradients calculated by the autograd module. Here, it is possible to choose from different optimization algorithms that are available, such as Adam, Stochastic Gradient Descent (SGD), and Root Mean Square Propagation (RMSprop), among others.
Note
The most popular optimization algorithms will be explained in subsequent chapters.
To set the optimizer to be used, the following line of code shall suffice, after importing the package:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Here, the model.parameters() argument refers to the weights and biases from the model that were previously created, while lr refers to the learning rate, which was set to 0.01.
Weights are the values that are used to determine the level of importance of a bit of information in a general context. This means that every bit of information has an accompanying weight for every neuron in the network. Moreover, bias is similar to the intercept element that's added to a linear function and is used to adjust the output from the computation of relevance in a given neuron.
The learning rate is a running parameter that's used in optimization processes to determine the extent of the steps to be taken toward minimizing the loss function.
Next, the process of running the optimization for 100 iterations is shown here, which, as you can see, uses the model created by the nn module and the gradients calculated by the autograd library:
Note
The # symbol in the code snippet below denotes a code comment. Comments are added into code to help explain specific bits of logic. The triple-quotes ( """ ) shown in the code snippet below are used to denote the start and end points of a multi-line code comment. Comments are added into code to help explain specific bits of logic.
for i in range(100):
# Call to the model to perform a prediction
y_pred = model(x)
# Calculation of loss function based on y_pred and y
loss = loss_funct(y_pred, y)
# Zero the gradients so that previous ones don't accumulate
optimizer.zero_grad()
# Calculate the gradients of the loss function
loss.backward()
"""
Call to the optimizer to perform an update
of the parameters
"""
optimizer.step()
For each iteration, the model is called to obtain a prediction (y_pred). This prediction and the ground truth values (y) are fed to the loss functions in order to determine the ability of the model to approximate to the ground truth.
Next, the gradients are zeroed, and the gradients of the loss function are calculated using the backward() function.
Finally, the step() function is called to update the weights and biases based on the optimization algorithm and the gradients calculated previously.
Exercise 1.03: Training a Neural Network
Note
For this exercise, use the same Jupyter Notebook from the previous exercise (Exercise 1.02, Defining a Single-Layer Architecture).
In this exercise, we will learn how to train the single-layer network from the previous exercise, using PyTorch's optim package. Considering that we will use dummy data as input, training the network won't solve a data problem, but it will be performed for learning purposes. Perform the following steps to complete this exercise:
- Import torch, the optim package from PyTorch, and matplotlib:
import torch
import torch.optim as optim
import matplotlib.pyplot as plt
- Create dummy input data (x) of random values and dummy target data (y) that only contains zeros and ones. Tensor x should have a size of (20,10), while the size of y should be (20,1):
x = torch.randn(20,10)
y = torch.randint(0,2, (20,1)).type(torch.FloatTensor)
- Define the optimization algorithm as the Adam optimizer. Set the learning rate equal to 0.01:
optimizer = optim.Adam(model.parameters(), lr=0.01)
- Run the optimization for 20 iterations, saving the value of the loss in a variable. Every five iterations, print the loss value:
losses = []
for i in range(20):
y_pred = model(x)
loss = loss_funct(y_pred, y)
losses.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i%5 == 0:
print(i, loss.item())
The output should look as follows:
0 0.25244325399398804
5 0.23448510468006134
10 0.21932794153690338
15 0.20741790533065796
The preceding output displays the epoch number, as well as the value for the loss function, which, as can be seen, is decreasing. This means that the training process is minimizing the loss function, which means that the model is able to understand the relationship between the input features and the target.
- Make a line plot to display the value of the loss function in each epoch:
plt.plot(range(0,20), losses)
plt.show()
The output should look as follows:
As you can see, the loss function is being minimized.
Note
To access the source code for this specific section, please refer to https://packt.live/2NJrPfd.
You can also run this example online at https://packt.live/2BTnXWw. You must execute the entire Notebook in order to get the desired result.
With that, you have successfully trained a single-layer neural network.
Activity 1.01: Creating a Single-Layer Neural Network
For this activity, we will create a single-layer neural network, which will be a starting point from which we will create deep neural networks in future activities. Let's look at the following scenario.
You work as an assistant of the mayor of Somerville and the HR department has asked you to build a model capable of predicting whether a person is happy with the current administration based on their satisfaction with the city's services. To do so, you have decided to build a single-layer neural network using PyTorch, using the response of previous surveys. Perform the following steps to complete this activity:
Note
The dataset that's being used for this activity was taken from the UC Irvine Machine Learning Repository, which can be downloaded using the following URL, from the Data Folder hyperlink: https://archive.ics.uci.edu/ml/datasets/Somerville+Happiness+Survey. It is also available in this book's GitHub repository: https://packt.live/38gzpr5.
- Import the required libraries, including pandas for reading a CSV file.
- Read the CSV file containing the dataset.
Note
It is recommended to use pandas' read_csv function to load the CSV file. To find out more about this function, visit https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html.
- Separate the input features from the target. Note that the target is located in the first column of the CSV file. Next, convert the values into tensors, making sure the values are converted into floats.
Note
To slice a pandas DataFrame, use pandas' iloc method. To find out more about this method, visit https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html.
- Define the architecture of the model and store it in a variable named model. Remember to create a single-layer model.
- Define the loss function to be used. In this case, use the MSE loss function.
- Define the optimizer of your model. In this case, use the Adam optimizer and a learning rate of 0.01.
- Run the optimization for 100 iterations, saving the loss value for each iteration. Print the loss value every 10 iterations.
- Make a line plot to display the loss value for each iteration step.
Note
The solution to this activity can be found on page 236.