Using Kaggle kernels
Kaggle is a popular data-science website owned by Google. It started out with competitions in which participants had to build machine learning models in order to make predictions. However, over the years, it has also had a popular forum, an online learning system and, most importantly for us, a hosted Jupyter service.
To use Kaggle, you can visit their website at https://www.kaggle.com/. In order to use the site, you will be required to create an account.
After you've created your account, you can find the Kernels page by clicking on Kernels located in the main menu, as seen in the following screenshot:
Public Kaggle kernels
In the preceding screenshot, you can see a number of kernels that other people have both written and published. Kernels can be private, but publishing kernels is a good way to show skills and share knowledge.
To start a new kernel, click New Kernel. In the dialog that follows, you want to select Notebook:
The kernel editor
You will get to the kernel editor, which looks like the preceding screenshot.
Note that Kaggle is actively iterating on the kernel design, and so a few elements might be in different positions, but the basic functionality is the same. The most important piece of a notebook is the code cells. Here you can enter the code and run it by clicking the run button on the bottom left, or alternatively by pressing Shift + Enter.
The variables you define in one cell become environment variables, so you can access them in another cell. Markdown cells allow you to write text in markdown format to add a description to what is going on in your code. You can upload and download notebooks with the little cloud buttons featured in the top-right corner.
To publish a notebook from the kernel editor, firstly you must click the Commit & Run button and then set the notebook to Public in the settings. To enable a GPU on your notebook, make sure to check the Enable GPU button located in the bottom right. It's important to remember that this will restart your notebook, so your environment variables will be lost.
Once you run the code, the run button turns into a stop button. If your code ever gets stuck, you can interrupt it by clicking that stop button. If you want to wipe all environment variables and begin anew, simply click the restart button located in the bottom-right corner.
With this system, you can connect a kernel to any dataset hosted on Kaggle, or alternatively you can just upload a new dataset on the fly. The notebooks belonging to this book already come with the data connection.
Kaggle kernels come with the most frequently used packages preinstalled, so for most of the time you do not have to worry about installing packages.
Sometimes this book does use custom packages not installed in Kaggle by default. In this case, you can add custom packages at the bottom of the Settings menu. Instructions for installing custom packages will be provided when they are used in this book.
Kaggle kernels are free to use and can save you a lot of time and money, so it's recommended to run the code samples on Kaggle. To copy a notebook, go to the link provided at the beginning of the code section of each chapter and then click Fork Notebook. Note that Kaggle kernels can run for up to six hours.
Running notebooks locally
If you have a machine powerful enough to run deep learning operations, you can run the code samples locally. In that case, it's strongly recommended to install Jupyter through Anaconda.
To install Anaconda, simply visit https://www.anaconda.com/download to download the distribution. The graphical installer will guide you through the steps necessary to install Anaconda on your system. When installing Anaconda, you'll also install a range of useful Python libraries such as NumPy and matplotlib, which will be used throughout this book.
After installing Anaconda, you can start a Jupyter server locally by opening your machine's Terminal and typing in the following code:
$ jupyter notebook
You can then visit the URL displayed in the Terminal. This will take you to your local notebook server.
To start a new notebook, click on New in the top-right corner.
All code samples in this book use Python 3, so make sure you are using Python 3 in your local notebooks. If you are running your notebooks locally, you will also need to install both TensorFlow and Keras, the two deep learning libraries used throughout this book.
Installing TensorFlow
Before installing Keras, we need to first install TensorFlow. You can install TensorFlow by opening a Terminal window and entering the following command:
$ sudo pip install TensorFlow
For instructions on how to install TensorFlow with GPU support, simply click on this link, where you will be provided with the instructions for doing so: https://www.tensorflow.org/.
It's worth noting that you will need a CUDA-enabled GPU in order to run TensorFlow with CUDA. For instructions on how to install CUDA, visit https://docs.nvidia.com/cuda/index.html.
Installing Keras
After you have installed TensorFlow, you can install Keras in the same way, by running the following command:
$ sudo pip install Keras
Keras will now automatically use the TensorFlow backend. Note that TensorFlow 1.7 will include Keras built in, which we'll cover this later on in this chapter.
Using data locally
To use the data of the book code samples locally, visit the notebooks on Kaggle and then download the connected datasets from there. Note that the file paths to the data will change depending on where you save the data, so you will need to replace the file paths when running notebooks locally.
Kaggle also offers a command-line interface, which allows you to download the data more easily. Visit https://github.com/Kaggle/kaggle-api for instructions on how to achieve this.