Python libraries: heapq

7 min read
PythonAlgorithmsHeap
TLDR
The Python heapq standard module implements a priority queue using a heap data structure (see Cormen et al. Introduction to Algorithms - Chapter 6). Priority queues are often used to support more complex algorithms like sorting and compressing. This post explores the heapq module and how to use it.

What is a heap

A heap is a data structure that can be used to store values in a tree-like structure maintaining a partial . Although it is convenient to think of a heap as a tree, its implementation uses arrays (lists in Python).
There are two types of heap: min-heap and max-heap, we cover the former, which is the one implemented in the heapq Python module. A min-heap (from now on, simply heap) is a binary tree that satisfies the heap property
For each non-root node vv with parent uu
v.valueu.valuev.value \geq u.value
In words, children have values no smaller than the value of their parent (for a max-heap the relation between values is \leq instead of \geq).
One use of the heap is in sorting, it is in fact the basis of an efficient sorting algorithm called heapsort and is based on the following fact.
Fact: The minimum value is at the root
If we apply the above relation all the way up to the root we can easily see that the minimum value can be found on the root.1
Heap structure VS Heap memory
It is important not to confuse the heap memory, where dynamic objects are stored with the heap data structure which is a specific type of data structure. The two things are completely different things and only share the name!

The heapq module

Python comes with a built-in implementation of a min-heap, which is in the heapq module. The module offers several functions to work on heaps stored in ordinary Python lists. To begin with heaps, you can simply define a list with the content and heapify it with heapq.heapify
q = [7, 17, 3, 20, 2, 27, 40]
heapq.heapify(q)
print(q) # [2, 7, 3, 20, 17, 27, 40]
Once we have a heap, we can pop (extract) elements out of it
m = heapq.heappop(q) 
print(m) # 2
print(q) # [3, 7, 27, 20, 17, 40]
or we can push (add) elements to it
heapq.heappush(q, 10)
print(q) # [3, 7, 10, 20, 17, 40, 27]
There is also a function that does the two operations in one: pop first and push second
k = heapq.heapreplace(q, 1)
print(k) # 3
print(q) # [1, 7, 10, 20, 17, 40, 27]
or push first and pop second
l = heapq.heappushpop(q, 0)
print(l) # 0
print(q) # [1, 7, 10, 20, 17, 40, 27]
To summarize the most commonly used functions in heapq
  • heapify(q): makes q a heap in-place;
  • heappush(q,e): adds e to the heap q
  • heappop(q): extracts the smallest value from the heap q
  • heapreplace(q,e): equivalent to pop followed by push(e)
  • heappushpop(q,e): equivalent to push(e) followed by pop
All above functions maintain the heap property.
Max heap
The heapq historically supported only min heap since its introduction in Python 2.3.Starting with Python 3.14, the library also supports max heap within the same heapq module with functions having the suffix _max, like heapify_max, heappush_max, ...

Three ways to use heaps

So, what is a heap useful for? The key property is the fact that we can efficiently obtain the minimum (maximum for max-heap) stored value. One can see the heap as a kind-of sorted structure in that it always returns the current minimum, although the structure itself is not sorted. This idea leads to an immediate sorting algorithm.

Heapsort

Using the heap is relatively easy to implement a sorting algorithm that takes the name of heapsort. The idea is to create a heap with all values and continuously extract the minimum as long as the heap is not empty.
def heapsort(l):
    s = []
    heapq.heapify(l)
    while(len(l) > 0):
        s.append(heapq.heappop(l))
    return s

print(heapsort([7, 17, 3, 20, 2, 27, 40]))
# [2, 3, 7, 17, 20, 27, 40]
Notice that this function "destroys" the input list l and puts elements in s sorted by increasing key.
In-place heapsort
The algorithm presented above works in-place (it doesn't require additional memory) as long as Python lists shrinks as we remove elements from them. In this case, the heap q becomes smaller by the same amount the list a becomes larger and no additional space is required.
It is possible to devise a strategy that implements heapsort in the same list (array). One way is to use a max-heap so that e push returns the maximum of the values stored. Once extracted these values are stored in the back of the array where the positions are no longer part of the heap.

Priority queue

A queue is a way to store objects so that the first that is added is the first that is eliminated (*First-In First-Out, FIFO). A priority queue is a way of storing objects so that each one has a priority number associated to it. The object to exit the queue next is the one that has the highest priority.
To implement a priority queue in Python using the heapq module, we leverage the fact the Python tuples have an implicit ordering. For example the tuple (1,"Hello"), comes before the tuple (2, "Hello"), since 1 comes before 2. We use a tuple with the priority as first element and the object as second element. In our implementation, smaller numbers mean higher priority, because heapq is a min-heap.
import heapq

people = [
    (3, "Alice"),
    (1, "Bob"),
    (5, "Carol"),
    (4, "Dave")
]
queue = []
for person in people:
    heapq.heappush(queue, person)
while(len(queue) > 0):
    extracted = heapq.heappop(queue)
    print(f"Next: {extracted[1]} (priority={extracted[0]})")

Path finding algorithm (Dijkstra)

A priority queue is a useful in many applications, one of these applications is the Dijkstra algorithm for path finding. Path finding algorithms are ubiquitous, you find it in navigation apps and video games, to mention two examples.
I have written a post on Dijkstra algorithm where you can see an example of usage of the heapq module.

Conclusion

The heapq module is one of many utility modules present in the Python standard library. We have seen a few example where such a module can become in handy like sorting and prioritizing. This concludes the first post on Python standard library.

🐻 📆 Bear of the day: Blanky

Blanky

Footnotes

  1. The proof is quite simple for the case where all values in the tree are distinct. Let uu be the node with the minimum value. Assume that uu is not the root and let vv be its parent. By the property above, the value of vv must be smaller than the value of uu. However, this is not possible, since we chose uu to have the smallest value. Hence, we reached a contradiction and uu must be the root. The result still holds with duplicate values, but the minimum may not be unique.