Priority Queues, Heaps and Arrays --
Representing A as B, which is represented as C

by Jan Plaza

A priority queue is a container that holds some comparable items, let's say, numbers. When created, the container is empty. You can put a number into this container. You can also request to get a number from the container, and you will always get the smallest one, which in this way gets removed from the container. (One could also consider an operation "peek" that tells the value of the smallest number without removing it. Another optional operation is a test whether the priority queue is empty or a more general operation that returns the number of items stored.)

This description suggests that the priority queue is a list of integers in increasing order, with insertions performed in the correct places to maintain the order, and where you can only delete the first element of the list. Although priority queues could be implemented as such lists, this would be not efficient. In the case of a linked list implementation, inserting an element would require a sequential search. In the case of an array implementation, inserting an element would start from finding the correct place (by binary search, in a logarithmic time) but would also require shifting the items to the right of the insertion place. In either version, in the worst case the complexity would be proportional to the number of elements already present in the list.

A better complexity can be obtained if we represent a priority queue as a heap, i.e. a complete binary tree in which every parent node holds a value smaller than its children. A rooted ordered binary tree is called a complete tree if all the levels other than last level are full and at the last level, if you consider the rightmost leaf, every possible node to the left of it is present. This means that if a complete tree with n nodes is embedded in a full binary tree with the same root and breadth-first search is performed on the full tree, all the nodes of the complete tree are visited before any other nodes are visited. Notice that the definition of the heap implies that the smallest item resides in the root. There are two basic operations on heaps, corresponding to the two basic operations on priority queues:

insert a new number as the "last" leaf of that tree and trickle it up to an appropriate place -- this results in a proper heap,
delete the number residing in the root of the tree, move the number from the "last" leaf of the tree into the vacant root and trickle it down to an appropriate place -- this results in a proper heap.

These operations can be performed in a time proportional to the logarithm of the number of elements in the heap.

So far we opted for replacing a sequential structure by a tree structure. This is not surprising at all because one always expects that a tree will offer a better access time. But now, we are going to replace the heap by an array, as we could do with any complete tree. Namely, we will represent a heap as an array in which the root resides at the index 1 and the children of any node that is at index k reside at indexes 2k and 2k+1. Heap operations can be easily implemented using this schema and they still have logarithmic complexity.

So, a priority queue, although at an abstract level it seems to have a sequential structure, is represented as a heap, which is a particular kind of a complete tree, which is implemented as an array.

Here are some important observations.

Objects may have a structure different from what it seems to be;
Trees are fast; lists are slow;
Arrays are more than lists; they have random access (and this makes them suitable for implementing complete trees).

The invention of heaps and such an implementation of priority queues is attributed to Donald B. Johnson. This is one of the pearls of computer science. Any good work brings additional benefits. Heaps lead to a new sorting algorithm, heapsort: start from an empty heap; insert all items to be sorted, then keep deleting the smallest item -- the items will come in the non-decreasing order. Heapsort has complexity n log n, is in-place, on-line, non-adaptive and not stable.

Think about the following.

If a heap item is stored in an array at the index n, what are the indexes of its parent, its left child, its right child, its left sibling and its right sibling?
What formulas would you use to find indexes of the parent and the children if the root of the tree were placed at index 0?
If you implemented a heap as a linked tree, how would you efficiently find the last leaf (needed for heap operations)?
Which implementation of a heap uses less memory, the linked tree implementation or the array implementation, and by how much?
Should your implementation of the heap store the current heap size?

Originally published in Standard Output, Vol. 2, No. 4, September 2007.

Priority Queues, Heaps and Arrays -- Representing A as B, which is represented as C

by Jan Plaza

Priority Queues, Heaps and Arrays --
Representing A as B, which is represented as C