IPython Interactive Computing and Visualization Cookbook
上QQ阅读APP看书,第一时间看更新

Introduction

In this chapter, we will see many features of the notebook, including the interactive widgets that have been brought by IPython 2.0. As we have only seen basic features in the previous chapters, we will dive deeper into the architecture of the notebook here.

What is the notebook?

The notebook was released in 2011, ten years after the creation of IPython. Its development has a long and complex history that is nicely summarized by Fernando Perez on his blog, http://blog.fperez.org/2012/01/ipython-notebook-historical.html. Inspired by mathematical software such as Maple, Mathematica, or Sage, the notebook really fostered the popularity of IPython.

By mixing together code, text, images, plots, hypertext links, and mathematical equations in a single document, the notebook brings reproducibility to interactive computing. The notebook, when used correctly, can radically change workflows in scientific computing. Prior to the notebook, one had to juggle between a text editor and an interactive prompt; now, one can stay focused within a single unified environment.

The notebook is not only a tool but also a powerful and robust architecture. Furthermore, this architecture is mostly language independent, so it's no longer tied to Python. The notebook defines a set of messaging protocols, APIs, and JavaScript code that can be used by other languages. In effect, we are now seeing non-Python kernels that can interact with the notebook such as IJulia, IHaskell, IRuby, and others.

At the SciPy conference in July 2014, the IPython developers even announced their decision to split the project into the following two parts:

  • The new Project Jupyter will implement all language-independent parts: the notebook, the messaging protocol, and the overall architecture. For more details, visit http://jupyter.org.
  • IPython will be the name of the Python kernel.

In this book, we do not make that semantic distinction, and we will use the term IPython to refer to the project as a whole (language-independent parts and Python kernel).

The notebook ecosystem

Notebooks are represented as JavaScript Object Notation (JSON) documents. JSON is a language-independent, text-based file format for representing structured documents. As such, notebooks can be processed by any programming language, and they can be converted to other formats such as Markdown, HTML, LaTeX/PDF, and others.

An ecosystem is being built around the notebook, and we can expect to see more and more usage in the near future. For example, Google is working on bringing the IPython notebook to Google Drive for collaborative data analytics. Also, notebooks are being used to create slides, teaching materials, blog posts, research papers, and even books. In fact, this very book is entirely written in the notebook.

IPython 2.0 introduced interactive widgets in the notebook. These widgets bring Python and the browser even closer. We can now create applications that implement bidirectional communication between the IPython kernel and the browser. Also, any JavaScript interactive library can be, in principle, integrated within the notebook. For example, the D3.js JavaScript visualization library is now being used by several Python projects to enable interactive visualization capabilities to the notebook. We are probably going to see many interesting uses of these interactive features in the near future.

Architecture of the IPython notebook

IPython implements a two-process model, with a kernel and a client. The client is the interface offering the user the ability to send Python code to the kernel. The kernel executes the code and returns the result to the client for display. In the Read-Evaluate-Print Loop (REPL) terminology, the kernel implements the Evaluate, whereas the client implements the Read and the Print of the process.

The client can be a Qt widget if we run the Qt console, or a browser if we run the notebook. In the notebook, the kernel receives entire cells at once, and thus has no notion of a notebook. There is a strong decoupling between the linear document containing the notebook, and the underlying kernel. This is a very strong constraint that may limit the possibilities, but that nevertheless leads to great simplicity and flexibility.

Another fundamental assumption in the whole architecture is that there can be at most one kernel connected to a notebook. However, IPython 3.0 offers the possibility of choosing the language kernel for any notebook.

It is important to keep these points in mind when thinking about new use-case scenarios for the notebook.

In the notebook, in addition to the Python kernel and the browser client, there is a Python server based on Tornado (www.tornadoweb.org). This process serves the HTML-based notebook interface.

All communication procedures between the different processes are implemented on top of the ZeroMQ (or ZMQ) messaging protocol (http://zeromq.org). The notebook communicates with the underlying kernel using WebSocket, a TCP-based protocol implemented in modern web browsers.

The browsers that officially support the notebook in IPython 2.x are as follows:

  • Chrome ≥ 13
  • Safari ≥ 5
  • Firefox ≥ 6

The notebook should also work on Internet Explorer ≥ 10. These requirements are essentially those for WebSocket.

Connecting multiple clients to one kernel

In a notebook, typing %connect_info in a cell gives the information we need to connect a new client (such as a Qt console) to the underlying kernel:

In [1]: %connect_info
{
  "stdin_port": 53978,
  "ip": "127.0.0.1", 
  "control_port": 53979, 
  "hb_port": 53980, 
  "signature_scheme": "hmac-sha256", 
  "key": "053...349", 
  "shell_port": 53976, 
  "transport": "tcp", 
  "iopub_port": 53977
}
Paste the above JSON code into a file, and connect with:
    $> ipython <app> --existing <file>
or, if you are local, you can connect with just:
    $> ipython <app> --existing kernel-6e0...b92.json
or even just:
    $> ipython <app> --existing
if this is the most recent IPython session you have started.

Here, <app> is console, qtconsole, or notebook.

It is even possible to have the kernel and the client running on different machines. You will find the instructions to run a public notebook server in the IPython documentation, available at http://ipython.org/ipython-doc/dev/notebook/public_server.html#running-a-public-notebook-server.

Security in notebooks

It is possible for someone to put malicious code in an IPython notebook. Since notebooks may contain hidden JavaScript code in a cell output, it is theoretically possible for malicious code to execute surreptitiously when the user opens a notebook.

For this reason, IPython 2.0 introduced a security model where HTML and JavaScript code in a notebook can be either trusted or untrusted. Outputs generated by the user are always trusted. However, outputs that were already there when the user first opened an existing notebook are untrusted.

The security model is based on a cryptographic signature present in every notebook. This signature is generated using a secret key owned by every user.

You can find further references on the security model in the following section.

References

The following are some references about the notebook architecture:

Here are a few (mostly experimental) kernels in non-Python languages for the notebook: