Delphi High Performance
上QQ阅读APP看书,第一时间看更新

Big O and Delphi data structures

Delphi's Run-Time Library (RTL) contains many data structures (classes that are specifically designed to store and retrieve data), mostly stored in System.Classes and System.Generics.Collection units that greatly simplify everyday work. We should, however, be aware of their good and bad sides.

Every data structure in the world is seeking a balance between four different types of data access: accessing the data, inserting the data, searching for data, and deleting data. Some data structures are good in some areas, others in different ones, but no data structure in this world can make all four operations independent of data size.

When designing a program, we should therefore know what our needs are. That will help us select the appropriate data structure for the job.

The most popular data structure in Delphi is undoubtedly TStringList. It can store a large amount of strings and assign an object to each of them. It can—and this is important—work in two modes, unsorted and sorted. The former, which is a default, keeps strings in the same order as they were added while the latter keeps them alphabetically ordered.

This directly affects the speed of some operations. While accessing any element in a string list can always be done in a constant time (O(1)), adding to a list can take O(1) when the list is not sorted and O(log n) when the list is sorted.

Why that big difference? When the list is unsorted, Add just adds a string at its end. If the list is, however, sorted, Add must first find a correct insertion place. It does this by executing a bisection search, which needs O(log n) steps to find the correct place.

The reverse holds true for searching in a string list. If it is not sorted, IndexOf needs to search (potentially) the whole list to find an element. In a sorted list, it can do it much faster (again by using a bisection) in O(log n) steps.

We can see that TStringList offers us two options - either a fast addition of elements or a fast lookup, but not both. In a practical situation, we must look at our algorithm and think wisely about what we really need and what will behave better.

To sort a string list, you can call its Sort method or you can set its Sorted property to True.  There is, however, a subtle difference that you should be aware of. While calling Sort sorts the list, it doesn't set its internal is sorted flag and all operations on the list will proceed as if the list is unsorted. Setting Sorted := True, on the other hand, does both - it sets the internal flag and calls the Sort method to sort the data.

To store any (non-string) data, we can use traditional TList and TObjectList classes or their more modern generic counterparts, TList<T> and TObjectList<T>. They all always work in an unsorted mode and so adding an element takes O(1) while finding and removing an element takes O(n) steps.

All provide a Sort function which sorts the data with a quicksort algorithm (O(n log n) on average) but only generic versions have a BinarySearch method, which searches for an element with a bisection search taking O(log n) steps. Be aware that BinarySearch requires the list to be sorted but doesn't make any checks to assert that. It is your responsibility to sort the list before you use this function.

If you need a very quick element lookup, paired with a fast addition and removal, then TDictionary is the solution. It has methods for adding (Add), removing (Remove) and finding a key (ContainsKey and TryGetValue) that, on average, function in a constant time, O(1). Their worst behavior is actually quite bad, O(n), but that will only occur on specially crafted sets of data that you will never see in practical applications.

I've told you before that there's no free lunch and so we can't expect that TDictionary is perfect. The big limitation is that we can't access the elements it is holding in a direct way. In other words, there is no TDictionary[i]. We can walk over all elements in a dictionary by using a for statement, but we can't access any of its elements directly. Another limitation of TDictionary is that it does not preserve the order in which elements were added.

Delphi also offers two simple data structures that mimic standard queue—TQueue<T>—and stack—TStack<T>. Both have very fast O(1) methods for adding and removing the data, but they don't offer any bells and whistles—there is no direct data access, we cannot search for data, and so on. We can only insert (Enqueue in queue or Push in stack) and remove (Dequeue and Pop) data.

To help you select the right tool for the job, I have put together a table showing the most important data structures and their most important methods, together with average and (when they differ from the average) worst-case time complexities:

The table shows the time complexity of the most important operations on built-in Delphi data structures. Complexity for the worst case is only listed if it differs from the average complexity.