Hands-On Reactive Programming with Python
上QQ阅读APP看书,第一时间看更新

The history of asynchronous programming in Python

Since the early days of Python, it has always been possible to do asynchronous programming, but in the old way; that is, by using callbacks. Chapter 1An Introduction to Reactive Programming, already explained some of the evolution in the frameworks and programming languages that made asynchronous programming easier during the last few years. The Python language naturally followed that trend, and many incremental improvements have been made since the early 2000s. Figure 2.4 shows the history of the main changes that occurred in Python concerning asynchronous programming.

The evolution on the left side are the elements that are still part of Python. The evolution on the right side concerns asynchronous frameworks and one deprecated module of the standard library:

Figure 2.4: The main steps of asynchronous programming in Python

The first dedicated support for asynchronous programming appeared in 1999—five years after the release of Python 1.0. The asyncore module was added in the standard library with release 1.5.2. This module was designed to develop asynchronous socket handlers. It was based on callbacks for the implementation of the handlers, and either select or poll as a reactor (Chapter 1, An Introduction to Reactive Programming, contains more details on the reactor design pattern). This module has been tagged as deprecated in Python 3.6, since asyncio replaced it with the support of much more features.

Then, in 2001, the support of generators was added in Python 2.2. Generators allow you to implement a function which behaves like an iterator. The main benefit compared to an iterator is that the whole list does not need to be stored in memory, but instead it can be computed on-demand. So, generators allow you to generate very big (or even infinite) lists, without a need to store this whole list in memory. By themselves, generators have nothing to do with asynchronous programming. Most probably, the people who designed it never thought about asynchronous programming at the time. However, as we will detail later, generators have a property which makes them very useful in asynchronous programming. They are functions that can be interrupted at some chosen location and they can be resumed at a later time in that location with their execution context being restored.

The first framework that made use of generators to ease asynchronous programming was Twisted (https://twistedmatrix.com). Twisted is an asynchronous framework that also uses callbacks for handlers, which is similar to asyncore. However, the Twisted framework added two main improvements when its first release was published in 2002. The first one was the split between transport implementations and protocol implementations. The transport layer is in charge of the transporting of the messages, while the protocol layer is in charge of handling the messages. The second one was the use of generators to make asynchronous programming look like synchronous programming. Using generators instead of callbacks makes the code much more readable since the handling of a sequence of asynchronous operations is done in a single generator function instead of many callbacks. 

The next two Python releases brought improvements on generators with the support of generator expressions in Python 2.4 and the addition of the send methods to generators in Python 2.5, published in 2004 and 2005, respectively. Generator expressions allow you to use generators with a syntax similar to list comprehension. The send method is a big improvement for asynchronous code because it allows you to give back a value to the generator each time it is resumed. This allows a generator and its caller to communicate. Before this, only the generator could provide data to its caller.

In July 2008, another asynchronous framework was published: Gevent (http://www.gevent.org/). Gevent is an alternative to Twisted. One main difference is that it uses the libev C library for the event loop implementation instead of a pure Python implementation in Twisted. Release 1.0 of Gevent was published in 2013.

In 2011, a final series of evolution dedicated to asynchronous programming added all the necessary features to make Python a state-of-the-art programming language concerning asynchronous programming. Futures were added in 2011 with the release of Python 3.2. A future is a way to represent a value that is not available yet, but that will be available at some point in the future. Delegation to a sub-generator was added in 2012 with the release of Python 2.3.

In 2014, the asyncio module was added to the standard library with the release of Python 3.4. This was a major addition that made Python async-ready without the need for third-party frameworks. The asyncio module took inspiration from existing frameworks and brought the best ideas directly available in Python. This was a major improvement compared with asyncore. The split between the transport layer and the protocol layer of Twisted was reused here. The event loop could run on a reactor or a proactor depending on the operating system, and coroutines (based on generators) were used to write asynchronous code that looked like synchronous code.

The last evolution was in 2015 with the addition of the async/await syntax in Python 3.5. One can consider async/await as syntactic sugar on top of the generator yield keyword. The two new keywords allow you to make asynchronous code and coroutines explicit, compared to the generator syntax, which can be used for other purposes than asynchronous programming. This last evolution made Python one of the first languages to support all features, allowing you to write asynchronous code almost as easily as synchronous code (or more objectively, easier to read than synchronous code).

The best way to understand why all these evolutions improved asynchronous programming in Python is to see the impact that they had by writing asynchronous code without these features and adding them one by one. We will go through these steps by implementing a simple state machine, driven by simulated asynchronous events. This same state machine will be implemented in different ways each time, using more recent features of Python. The state machine is the one shown in the following diagram:

Figure 2.5: A length-based packet unframing state machine

This state machine unframes packets coming from a data channel. The structure of each frame is the following one:

  • A sync word with a value of 42
  • An integer indicating the size of the payload
  • The payload, whose length must correspond to the value of the size field

For simplicity, the implementations will not care about the size and endianness of the words/integers (16, 32, or 64 bits, little or big-endian). We will simply use the Python integer type. Moreover, the channel is simulated. The aim here is to see how writing asynchronous code evolved, not to write real asynchronous code (but don't worry, this will come later). If a sync word other than 42 is received, then the state machines go into an error state.