Performance of Python Data Structures

It is important for you to understand the efficiency of these Python data structures because they are the building blocks we will use as we implement other data structure in the remainder of the book.

Lists

The designers of Python had many choices to make when they implemented the list data structure. Each of these choices could have an impact of how fast on how fast list operations perform. To help them make the right choices they looked at the ways that people would most commonly use the list data structure and they optimized their implementation of a list so that the most common operations were very fast. Of course they also tried to make the less common operations fast, but when a tradeoff and to be made the performance of a less common operation was often sacrificed in favor of the more common operation.

Two common operations are indexing and assigning to an index position. Bothe of these operations take the same amount of time no matter how large the list becomes. When an operation like this is independent of the size of the list they are O(1) .

Another very common programming task is to grow a list. There are two ways to create a longer list. You can use the append method or the concatenation operator.

  • The append method is O(1) .
  • However, the concatenation operator is O(k) where k is the size of the list that is being concatenated.

This is important for you know because it can help you make your own programs more efficient by choosing the right tool for the job.

Let us look at four different ways we might generate a list of n numbers starting with 0. First we will try a for loop and create the list by concatenation, then we will use append rather than concatenation.

def test1():
    l = []
    for i in range(1000): l = l + [i]

def test2():
    l = []
    for i in range(1000): l.append(i)

def test3():
    l = [i for i in range(1000)]

def test4():
    l = list(range(1000))
# Import the timeit module
import timeit
# Import the Timer class defined in the module
from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
print("concat ", t1.timeit(number=1000), "milliseconds")
t2 = Timer("test2()", "from __main__ import test2")
print("append ", t2.timeit(number=1000), "milliseconds")
t3 = Timer("test3()", "from __main__ import test3")
print("comprehension ", t3.timeit(number=1000), "milliseonds")
t4 = Timer("test4()", "from __main__ import test4")
print("list range.", t4.timeit(number=1000), "milliseconds")

"""concat 6.54352807999 milliseconds"""
"""append 0.306292057037 milliseconds"""
"""comprehension 0.147661924362 milliseconds"""
"""list range 0.0655000209808 milliseconds"""

In the above experiment we also show the times for two additional methods for creating a list:

  • using the list constructor with a call to range and
  • a list comprehension.

It is interesting to note that the list comprehension is twice as fast as a for loop with an append operation. After thinking carefully about Table 2.2, you may be wondering about the two different times for pop.

  • When pop is called on the end of list it take O(1), but
  • When pop called on the first element in the list or anywhere in the middle it is O(n).

The reason for this lies in how Python chooses to implement lists.

  • When an item is taken from the front of the list, all the other elements in the list are shifted one position closer to the beginning. This may seem silly to you now, but if you look at Table 2.2 you will see that this implementation also allows the index operation to be O(1). This is a tradeoff that the Python implementors shout was a good one.

As a a way of demonstrating this difference in performance let us do another experiment using the timeit module.

  • Our goal is to be able to verify the performance of the pop operation on a list of a known size when the program pops from the end of the list, and again when the program pops from the beginning of the list.
  • We will also want to measure this time for lists of different size. What we would expect to see is that the time required to pop from the end of the list will stay constant even as the list grows in size, while the time to pop from the beginning of the list will continue to increase as the list grows.

The code below shows one attempt to measure the difference between the two uses of pop. As you can see from this first example, popping from the end takes 0.0003 milliseconds, whereas popping from the beginning takes 4.82 milliseconds. For a list of two million elements this is a factor of 16,000.

pop_zero = Timer("x.pop(0)",
                 "from __main__ import x")
pop_end = Timer("x.pop()",
                "from __main__ import x")

x = list(range(2000000))
pop_zero.timeit(number=1000)

x = list(range(2000000))
pop_end.timeit(number=1000)

tab_2-2

There are a couple of things to notice about this code. The first is the statement from __main__ import $x$. Although we did not define a function we do want to be able to use the list object x in our test. This approach allows us to time just the single pop statement and get the most accurate measure of the time for that single operation. Because the timer repeats 1000 times it is also important to point out that the list is decreasing in size by 1 each time through the loop. But since the initial list is two million elements in size we only reduce the overall size by 0.05\% .

pop_zero = Timer("x.pop(0)",
                 "from __main__ import x")
pop_end = Timer("x.pop()",
                "from __main__ import x")
print("pop(0) pop()")
for i in range(10000000, 100000001, 1000000):
    x = list(range(i))
    pt = pop_end.timeit(number=1000)
    x = list(range(i))
    pz = pop_zero.timeit(number=1000)
    print("%15.5f, %15.5f" %(pz, pt))

Dictionaries

The second major Python data structure is the dictionary. As you probably recall, dictionaries differ form lists in that you can access items in a dictionary by a key rather than a position. The thing that is most important to notice right now I that the get item and set item operations on a dictionary are O(1) . Another important dictionary operation is the contains operation. Checking to see whether a key is in the dictionary or not is also O(1) . The efficiency of all dictionary operations is summarized in Table 2.3. One important side not on dictionary performance is that the efficiencies we provide in the table are fora average performance. In some rare cases the contains, get item, and set item operations can degenerate into O(n) performance bet we will get into that when we talk about the different ways that a dictionary could be implemented.

tab_2-3

For our last performance experiment we will compare the performance of the contains operation between lists and dictionaries. In the process we will confirm that the contains operator for lists is O(n) and the contains operator for dictionaries is O(1) . The experiment we will use to compare the two is simple. We'll make a list with a range of numbers in it. The then we will pick numbers at random and check to see if the numbers are in the list.

To be continued.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s