上QQ阅读APP看书，第一时间看更新

Algorithm complexity

Before we start with the dirty (and fun) job of improving program speed, I'd like to present a bit of computer science theory, namely the Big O notation.

You don't have to worry, I will not use pages of mathematical formulas and talk about infinitesimal asymptotics. Instead, I will just present the essence of the Big O notation, the parts that are important to every programmer.

In the literature and, of course, on the web, you will see expressions such as O(n), O(n^2), O(1) and similar. This fancy-looking notation hides a really simple story. It tells us how much slower the algorithm will become if we increase the data size by a factor of n.

The n^2 notation means " n to the power of two", or n². This notation is frequently used on the internet because it can be written with the standard ASCII characters. This book uses the more readable variant O(n²).

Let's say we have an algorithm with complexity of O(n), which on average takes T seconds to process input data of size N. If we increase the size of the data by a factor of 10 (to 10*N), then the algorithm will (on average) also use 10 times more time (that is, 10*T) to process the data. If we process 1,000 times more data, the program will also run 1,000 times slower.

If the algorithm complexity is O(n²), increasing the size of the data by a factor of 10 will cause the algorithm to run 10² or 100 times longer. If we want to process 1,000 times more data, then the algorithm will take 1,000² or a million times longer, which is quite a hit. Such algorithms are typically not very useful if we have to process large amounts of data.

Most of the time, we use the Big O notation to describe how the computation time relates to the input data size. When this is the case, we call the Big O notation time complexity. Nevertheless, sometimes the same notation is used to describe how much storage (memory) the algorithm is using. In that case, we are talking about a space complexity.

You may have noticed that I was using the word average a lot in the last few paragraphs. When talking about the algorithm complexity, we are mostly interested in the average behavior, but sometimes we will also need to know about the worst behavior. We rarely talk about best behavior because users don't really care much if the program is sometimes faster than average.

Let's look at an example. The following function checks whether a string parameter value is present in a string list:

function IsPresentInList(strings: TStrings; const value: string): Boolean;
var
  i: Integer;
begin
  Result := False;
  for i := 0 to strings.Count - 1 do 
    if SameText(strings[i], value) then 
      Exit(True);
end;

What can we tell about this function? The best case is really simple—it will find that the value is equal to strings[0] and it will exit. Great! The best behavior for our function is O(1). That, sadly, doesn't tell us much as that won't happen frequently in practice.

The worst behavior is also easy to find. If the value is not present in the list, the code will have to scan all of the strings list before deciding that it should return False. In other words, the worst behavior is O(n), if the n represents the number of elements in the list. Incidentally (and without proof), the average behavior for this kind of search is also O(n).

The Big O limits don't care about constant factors. If an algorithm would use n/2 steps on average, or even just 0.0001 * n steps, we would still write this down as O(n). Of course, a O(10 * n) algorithm is slower than a O(n) algorithm and that is absolutely important when we fine-tune the code, but no constant factor C will make O(C * n) faster than O(log n) if n gets sufficiently large.

There are better ways to check whether an element is present in some data than searching the list sequentially. We will explore one of them in the next section, Big O and Delphi data structures.

While the function of n inside the O() notation can be anything, there are some O functions that appear constantly in standard programming problems. The following table shows those Big O limits and the most common examples of problems that belong to each class:

If we care about program performance, then O(1) algorithms are of special interest to us as they present algorithms which don't get slower (at least not noticeably) when we increase the problem size. We'll see an example of such O(1) algorithms in the next section.

When we deal with algorithms that search in some datasets, we usually try to make them behave as O(log n), not O(n), as the former slows down much, much slower than the latter.

Another big class of problems deals with sorting the data. While the naive approaches sort in O(n²), better algorithms (such as mergesort and quicksort) need on average just O(n log n) steps.

The following image shows how the time complexity for these typical limits (we have used 2ⁿ as an example of a more generic cⁿ) grows when we increase the problem size up to 20-fold:

Most frequently encountered Big-O limits

We can see that O(1) and O(log n) grow very slowly. While O(n log n) grows faster than O(n), it also grows much slower than O(n²), which we had to stop plotting when data was increased nine-fold.

The O(2ⁿ) starts slowly and looks like a great solution for small data sizes (small n), but then it starts rising terribly fast, much faster than O(n²).

The following table shows how fast O(n log n) and O(n²) are growing if we compare them with O(n) and how quickly O(2ⁿ) explodes.

The data column shows the data size increase factor. The number 10 in this column, for example, represents input with 10 times more elements than in the original data:

We can see from this table that O(log n) algorithms present a big improvement over O(n) algorithms (8 versus 100 times increase in time when data increases 100-fold). We can also see that the O(2ⁿ) quickly becomes completely unmanageable.

The last cell in this table is particularly interesting. There are different estimates for the number of elementary particles (electrons, protons, neutrons, and so on) in the visible universe, but they all lie somewhere around 10⁹⁰. Suppose we have a computer which can solve an O(2ⁿ) in a reasonable time. If we would increase the input data by a factor of just 300, then we would need 10⁹⁰computers to solve the new problem in the same time. That is as much as the number of particles in the visible universe!

Don't use algorithms which have time complexity O(2ⁿ). It won't end well.

本周热推：

一本书读懂大数据 Python金融大数据分析（第2版）SQL必知必会（第5版）利用Python进行数据分析（原书第2版）精益数据分析