Random Permutation Generation
Many problems require the use of randomly generated permutations, kpermutations, and ksubsets. Simple and efficient algorithms exist for producing these combinatorial objects; however, these algorithms are somewhat subtle and most people do not intuitively understand them. The algorithm for generating random permutations is the most significant of these algorithms, and the other algorithms are simple variants of this algorithm.
Permutations are rearrangements of items where the order the items are listed in is important. For example, the rearrangements {1,3,2} and {1,2,3} are two different permutations of the numbers [1,2,3]. A permutation has the property that it contains all the items in a given set without any repetitions. For instance, the lists {1,2}, and {1,2,2,3} are not permutations of [1,2,3].
It is helpful to visualize permutations as paths in a permutation tree. A permutation tree models the choices available when forming a permutation. The colored edges of the permutation tree denote the items that are being permuted. A particular item corresponds to edges of a specific color. Following an edge in the permutation tree is equivalent to selecting an item, and permutations are formed by following paths from the root to the leaves. Figure 1, shows a permutation tree for four items.
A permutation tree of n items is generated recursively; at the root all n items are available for selection, so the root has n edges of different colors leading to n subtrees. At the subtrees of the root only n-1 items are available (one item has been selected), so the subtrees of the root each have n-1 edges leading to n-1 subtrees. The color of the missing edge corresponds to the the item that has been selected, and each subtree itself is a permutation tree for n-1 items (see Figure 1). This process is repeated with the subtrees of the subtrees, and so on, until the leaves are reached. At that point all items have been selected; therefore, no edges are available to continue the process.
This construction of the permutation tree assures that all permutations of n items correspond to paths in the tree. Also, no two distinct paths correspond to the same permutation of items. This implies that there are as many permutations of n items as there are paths in a permutation tree for n items.
Counting the number of paths in a permutation tree is fairly straightforward; at the root n paths emerge, and each of these n paths splits into n-1 paths at the subtrees of the root. This gives n*(n-1) paths, and each one of these splits into a further n-2 paths at the next level in the permutation tree. The paths terminate at the leaves by which time the original n paths have split into n! (n factorial) paths, and that equals:
n! = n*(n-1)*(n-2)*..*1
An algorithm for selecting random permutations must assure that it has equal chance of picking any one of the n! possible permutations. Figure 2, suggests a simple algorithm for generating random permutations; to get a permutation of n items, simply pick a random path from the root to the leaves in a permutation tree for n items.
A random path can be picked by starting at the root, randomly picking one of the available edges to get to a subtree, and repeating that process to get to successive subtrees until a leaf is reached. The animation below illustrates this process.
Intuitively, this process works because the random path picking algorithm is not biased towards any particular path; therefore, all n! paths have equal chance of being picked. More formally, the probability of picking a path consisting of edges {E1,E2,..,En} is the probability of picking the path leading to edge En, and multiplying by the probability of picking edge En given that the path leading to En has been picked.
P({E1,E2,..,En}) = P({E1,E2,..,En-1}) * P(En|{E1,E2,..,En-1})
The above equation can be recursively expanded by substituting in the value for P({E1,E2,..,Ek}) in the right hand side of the equation.
P({E1,E2,..,En}) = P({E1,E2,..,En-1}) * P(En|{E1,E2,..,En-1}) = P({E1,E2,..,En-2}) * P(En-1|{E1,E2,..,En-2}) * P(En|{E1,E2,..,En-1}) .. = P(E1|{}) * P(E2|{E1}) * .. * P(En|{E1,E2,..,En-1}) = P(E1) * P(E2|{E1}) * .. * P(En|{E1,E2,..,En-1})
The probability P(Ek+1|{E1,E2,..,Ek}) can be computed readily. At the point where edge Ek+1 is in consideration k edges have been selected, and edges Ek+1 to En are available. Edge Ek+1 is randomly chosen from all the available choices, so there are n-k choices for it. Therefore, the probability of picking edge Ek+1 is 1 / (n-k). The following expression gives the probability of picking a particular path in a permutation tree for n items:
P({E1,E2..En}) = P(E1) * P(E2|{E1}) * .. * P(En|{E1,E2,..En-1}) = 1/n * 1/(n-1) * .. * 1 = 1/n!
The above expression implies that the path picking algorithm described above picks any path with probability 1/n!; therefore, the algorithm picks all permutations with equal probability.
The random path picking algorithm needs to be efficient; generating the whole permutation tree in advance is not practical. Fortunately, a complete permutation tree is not needed. Each iteration of the algorithm only needs to keep track of the path chosen so far, and the possible selection of choices for the next edge of the path; Figure 2, illustrates this idea.
There are a maximum of n outgoing edges at any node of the permutation tree, and these can be modeled as numbers 1 to n in an array. Selecting an item frees up space for one edge, and a clever algorithm can use that space to store the path. An algorithm that implements these ideas is given below:
for (i=1 to n) ar[i] = i; for (i=1 to n) swap(ar[i], ar[Random(i,n)]);
The above algorithm partitions the elements of an array of n elements into two parts, positions 1 to i-1 and positions i to n. Positions 1 to i-1 contain the segment of the path chosen so far, and positions i to n contain the unselected edges. The first loop of the algorithm is just initialization of the array elements, and the second loop does all the work in this algorithm.
Each iteration of the second loop extends the path selected so far by one edge. This is done by picking a random edge from the unselected edges and moving it to position i. Of course, the edge at position i has to go somewhere; therefore, the edge at position i is moved to the former position of the selected edge. The table below illustrates this algorithm for picking a 5 item permutation.
Iteration | Selected Edge | Selected Path | Unselected Edges |
? | 1 2 3 4 5 | ||
1 | 4 | 4 | 2 3 1 5 |
2 | 1 | 4 1 | 3 2 5 |
3 | 3 | 4 1 3 | 2 5 |
4 | 5 | 4 1 3 5 | 2 |
5 | 2 | 4 1 3 5 2 |
In the above table, array ar is the concatenation of the last two columns. At each iteration one edge is chosen and moved to position i. Unlike the permutation tree, unselected edges in array ar are in no specific order. This is all right as the algorithm chooses amongst the unselected edges randomly, and any specific ordering of the edges is irrelevant.
Another thing to notice is that after the second last iteration of the algorithm the contents of the array ar do not change. At the last iteration only one edge is left unselected; going through the iteration simply swaps it with itself and results in no change. This suggests that the last iteration of the algorithm is redundant and the algorithm can be changed to the following:
for (i=1 to n) ar[i] = i; for (i=1 to (n-1)) swap(ar[i], ar[Random(i,n)]);
Multiple random permutations can be generated by repeating the second loop. As long as array ar contains a permutation of numbers 1 to n, it can be reused without reinitialization.
Choosing a random k-permutation turns out to be fairly simple as well. A kpermutation is a rearrangement of k items selected from a total of n items. Kpermutations are a generalization of permutations. A permutation is simply a rearrangement of all n items; a permutation is equivalent to a kpermutation when k equals n.
Kpermutations too can be visualized as paths in a permutation tree. Kpermutations correspond to paths of length k from the root of the tree. A kpermutation tree can be generated by pruning the permutation tree so that all possible paths from the root to the leaves are of length k. Figure 3, shows such a tree for n = 5 and k = 3. This correspondence with paths suggests the following algorithm for generating kpermutations:
for (i=1 to n) ar[i] = i; for (i=1 to k) swap(ar[i], ar[Random(i,n)]);
The algorithm above is a modified version of the algorithm for generating permutations; that algorithm yields a path of length n while this one yields a path of length k. Also, it is not possible to optimize this algorithm to cut down one iteration as was the case with the previous algorithm. The optimization in the algorithm for random permutations was possible only because at the last iteration only one edge was left unselected. However, with n less than k more than 1 edges will be available for selection at the kth iteration, and randomly choosing one of the edges will make a difference.
The final algorithm is for choosing a random ksubset. A ksubset is simply a kpermutation where the order of items is unimportant. The lists {1,3}, {3,1} are different kpermutations of the items [1,2,3], but they are the same ksubset. While the lists {1,2} and {1,3} are different ksubsets as well as different kpermutations of [1,2,3].
Paths from the root to the leaves in the kpermutation tree enumerate all possible kpermutations; therefore, they also enumerate all possible ksubsets of n items. The only problem is that each ksubset is enumerated multiple times. This is not an issue as each ksubset appears exactly k! times; once for each permutation of the items in the ksubset. As each ksubset occurs as often as any other all ksubsets have the same probability of being picked at random.
Some problems require ksubsets that are arranged in a sorted order. For example, {1,2,3} and {1,3,2} are the same ksubset but {1,2,3} is in ascending order while {1,3,2} is unsorted. For small k, sorting is inexpensive; therefore the best approach to generating ordered ksubsets in such cases is to simply sort the elements of the ksubset after generating it.
The algorithms for random kpermutations and ksubsets use O(n) space and O(n) time to generate objects of size O(k). O(n) space is needed for the array to store the chosen path and the unselected edges, and O(n) time is needed for initializing the elements of this array. The time and space complexity is not a problem when n and k do not differ by too much, but if n is sufficiently large and k is small, the algorithms can be very inefficient. Part II of this article will discuss techniques to bring the space and time complexity of these algorithms to O(k).
LAST UPDATED by Usman Latif [Feb 13, 2004]
Part II – Random KPermutations and KSubsets in O(k) space
The first part of this article described some algorithms for the generation of random permutations, kpermutations and ksubsets. The algorithm for generating permutations is very efficient; however, the algorithms for generating kpermutations and ksubsets can be improved significantly.
The following algorithm was given for the generation of random kpermutations, and ksubsets:
for (i=1 to n) ar[i] = i; for (i=1 to k) swap(ar[i], ar[Random(i,n)]);
The big problem with this algorithm is that it uses space inefficiently. Selecting a Kpermutation of size k from n elements can require O(n) space. For instance, when selecting a KPermutation of 1000 items from a collection of 100,000 items, the algorithm requires an array of 100,000 elements. This is clearly suboptimal and undesirable. The algorithm also initializes the array of 100,000 elements, and this results in O(n) time as well.
Ideally the algorithm should use only O(k) space and O(k) time. This turns out to be possible but requires a few changes to the basic algorithm. The crucial insight is that while generating a Kpermutation, the algorithm disturbs at most O(k) elements in the array containing the item numbers. This allows using a sparse array of O(k) space to represent an array of n elements.
The sparse array contains the numbers 1 to n arranged in ascending order, and it is modeled by a hashtable. Elements that get disturbed by the kpermutation algorithm are placed in the hashtable, and all other elements are assumed to be in their correct places in the array. For example, an array with the following contents:
1 2 3 4 6 5 7 8 9
can be represented by a hashtable containing the following key-value pairs:
(5 6) (6 5)
An array reference for element i is satisfied by first searching for the key i in the hashtable, and returning the associated value if it is in there; if the key is not in the hashtable the value i is returned. For example, an array reference for 3 in the above array returns 3, but an array reference for 6 returns 5 (the key 6 is in the hashtable).
The kpermutation algorithm only uses a swap operation, so by implementing a swap operation for the sparse array representation the original algorithm can be used without any significant modifications. The algorithm will change to the following pseudo code:
ar = init_htable(); for (i=1 to k) swap_keys(ar, i, Random(i,n));
The swap_keys operation is fairly straightforward to implement, but as the keys might or might not be present in the hashtable, there are four cases to be handled. The four cases and the actions required to swap the keys are summarized below:
Key i Key j Action present present swap values, i.e., (i,n)(j,m) -> (i,m)(j,n) present absent add (j,j) to htable and swap values absent present add (i,i) to htable and swap values absent absent add (i,i) (j,j) to htable and swap values
The algorithm takes O(k) space and O(k) time as the swap_keys function executes k times, and on each invocation it uses O(1) time and can allocate at most O(1) additional space. After the algorithm has finished executing, the kpermutation can be retrieved from the hashtable by retrieving keys 1 to k in order. Actually, the algorithm can be easily optimized so that the kpermutation is directly stored in a separate array. Other optimizations are also possible but they don’t improve the O(k) time and space behavior of the algorithm beyond constant factors.
Unlike the original random kpermutation algorithm, this algorithm requires a reinitialization of the hashtable whenever a new kpermutation needs to be generated. Generating many kpermutations without reinitialization of the hashtable will lead the algorithm to use more than O(k) space; this behavior being the result of the swap_keys operation executing more than k times.
As discussed in part I of this article, the random ksubset generation algorithm is the same as the kpermutation algorithm, so the above algorithm can generate ksubsets using O(k) space and O(k) time. However, if the ksubset is required to be in sorted order then a bucket sort can be used to sort the output of this algorithm. A bucket sort will work very well in this scenario as the items of the random ksubset will be distributed randomly over the interval 1 to n. Combining bucket sort with this algorithm will allow sorted ksubsets to be generated in O(k) time.
LAST UPDATED by Usman Latif [Mar 26, 2004]