Optimized QuickSort — C Implementation (Non-Recursive)
August 2005, July 2007



NOTE 2010.02.25: I’ve received a few e-mails over the past few years telling me that my implementation of QuickSort may not be an improvement over the popular, recursive implementation.  When I have time, I plan to perform my own comprehensive speed tests.  Until then, don’t take the below information as gospel!  Perform your own speed tests on your own system, as needed.


Wikipedia contains a C language implementation of QuickSort which looks like this: [UPDATE:  Wikipedia’s content is constantly shifting, and is no longer accurately reflected by the following code sample. Also, Jiri Horky has notified me that the below code contains a couple of errors. It was captured in August 2005, but Wikipedia doesn’t seem to go back that far. Their October 2005 version includes Horky’s fixes.]


void swap(int *a, int *b)
{
  int t=*a; *a=*b; *b=t;
}
void sort(int arr[], int beg, int end)
{
  if (end > beg + 1)
  {
    int piv = arr[beg], l = beg + 1, r = end;
    while (l < r)
    {
      if (arr[l] <= piv)
        l++;
      else
        swap(&arr[l], &arr[--r]);
    }
    swap(&arr[--l], &arr[beg]);
    sort(arr, beg, l);
    sort(arr, r, end);
  }
}


This Wiki code is closely replicated on many other sites, and probably in many in-use implementations.  I have rewritten it here to execute more efficiently, as follows:


//  quickSort
//
//  This public-domain C implementation by Darel Rex Finley.
//
//  * Returns YES if sort was successful, or NO if the nested
//    pivots went too deep, in which case your array will have
//    been re-ordered, but probably not sorted correctly.
//
//  * This function assumes it is called with valid parameters.
//
//  * Example calls:
//    quickSort(&myArray[0],5); // sorts elements 0, 1, 2, 3, and 4
//    quickSort(&myArray[3],5); // sorts elements 3, 4, 5, 6, and 7

bool quickSort(int *arr, int elements) {

  #define  MAX_LEVELS  1000

  int  piv, beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R ;

  beg[0]=0; end[0]=elements;
  while (i>=0) {
    L=beg[i]; R=end[i]-1;
    if (L<R) {
      piv=arr[L]; if (i==MAX_LEVELS-1) return NO;
      while (L<R) {
        while (arr[R]>=piv && L<R) R--; if (L<R) arr[L++]=arr[R];
        while (arr[L]<=piv && L<R) L++; if (L<R) arr[R--]=arr[L]; }
      arr[L]=piv; beg[i+1]=L+1; end[i+1]=end[i]; end[i++]=L; }
    else {
      i--; }}
  return YES; }


Advantages of my code over the Wikipedia code:

1. FUNCTION CALLS  My implementation avoids function calls.  Calling (and returning from) a function is time-consuming, and not because the content of the function takes time to execute — just getting into the function, and back out of it, uses processor time.

2. STACK  My implementation does not use the stack to store data.  Recursive functions look neat and simple in part because they don’t have to have big arrays like my beg[] and end[].  But all they’re really doing is using the stack as their own private array.  This is much slower than using a real array, and could cause stack overflow on some systems, crashing the app that called the quicksort.  My implementation simply returns NO if this kind of failure occurs.

3. SWAP VARIABLE  In any one pivot round, the Wikipedia version can pass items through a swap variable many times.  My code passes only one item (the pivot) through a swap variable, and only once per round.

4. MULTIPLE MOVES  In any one pivot round, the Wikipedia version can move the same item more than once in the list.  My code never moves an item to a new position in the list more than once per round.

5. UNNECESSARY MOVES  In any one pivot round, the Wikipedia version will sometimes move items to new positions in the list even though they were already on the correct side of where the pivot item was going to wind up.  My code never moves an item if its current position is on the correct side of the pivot’s destination.

6. SELF-SWAPS  In each pivot round, the Wikipedia version has about a 50% chance of swapping one of the list items with itself.  (And this swap incurs the costs described above in #1, #2, and #3.)

(Note that the benefits of all six advantages are greatly increased if the items being sorted are substantially sized data structures, as opposed to simple integers as in these code examples.)


Limitations of my code vs. the Wiki code:

1. COMPARISONS  My code performs more comparisons, on average, than the Wiki version.  Wiki does 30% fewer comparisons by my (very rough) estimate.


Both functions seem to do about the same number of increments/decrements, as far as I can tell.


Here’s an example of how my code handles one pivot round (a.k.a. “partition”):

piv [0] [1] [2] [3] [4] [5] [6] [7]
? 4 5 3 6 2 8 1 7     Starting condition.
    Copy first item to pivot stash.
4 4 5 3 6 2 8 1 7
    Set L and R to the span of the partition.
    Decrement R until it finds an item that is less than the pivot value.
    Copy that item to position L.
4 1 5 3 6 2 8 1 7
    Increment L until it finds an item that is more than the pivot value.
    Copy that item to position R.
4 1 5 3 6 2 8 5 7
    Decrement R until it finds an item that is less than the pivot value.
    Copy that item to position L.
412362857
    Increment L until it finds an item that is more than the pivot value.
    Copy that item to position R.
412366857
    Decrement R — now L is not less than R, so...
    ...copy the pivot to position L.
4 1 2 3 4 6 8 5 7     Done!
piv [0] [1] [2] [3] [4] [5] [6] [7]

Notice that the data values 3, 8, and 7 (in slots [2], [5], and [7], respectively) do not move because they don’t need to.  All other items (except the pivot) are moved only once, and each item is compared with the pivot only once.


Now — here’s how the Wiki code handles the same pivot round.  (I will assume that the programmer has the good sense to declare the “swap” function “inline”, so that there is no function-call overhead during the processing of a pivot round.)

piv T [0] [1] [2] [3] [4] [5] [6] [7]
? ? 4 5 3 6 2 8 1 7     Starting condition.
    Copy first item to pivot stash.
4 ? 4 5 3 6 2 8 1 7
    Initialize L and R.
    Increment L until it finds an item larger than the pivot.
    Decrement R.
    Swap items L and R through swap var T.
4 5 4 5 3 6 2 8 1 7
4 5 4 7 3 6 2 8 1 7
4 5 4 7 3 6 2 8 1 5
    Increment L until it finds an item larger than the pivot.
    Decrement R.
    Swap items L and R through swap var T.
4 7 4 7 3 6 2 8 1 5
4 7 4 1 3 6 2 8 1 5
4 7 4 1 3 6 2 8 7 5
    Increment L until it finds an item larger than the pivot.
    Decrement R.
    Swap items L and R through swap var T.
4 6 4 1 3 6 2 8 7 5
4 6 4 1 3 8 2 8 7 5
4 6 4 1 3 8 2 6 7 5
    Increment L until it finds an item larger than the pivot.
    Decrement R.
    Swap items L and R through swap var T.
4 8 4 1 3 8 2 6 7 5
4 8 4 1 3 2 2 6 7 5
4 8 4 1 3 2 8 6 7 5
    Increment L — now L is not less than R, so...
    ...decrement L...
    ...and swap item L and the first item through swap var T.
4 2 4 1 3 2 8 6 7 5
4 2 4 1 3 4 8 6 7 5
4 2 2 1 3 4 8 6 7 5     Done.
piv T [0] [1] [2] [3] [4] [5] [6] [7]

Of the data values 3, 8, and 7 (none of which need to move) only 3 is left alone, and 8 is moved twice.  The swap variable is used five times (or five and a half if you count copying out the pivot value, which is part of the single swap my code uses).


No-Fail Version

Here’s a modified version that never fails — i.e. it never runs into its MAX_LEVELS limit:

//  quickSort
//
//  This public-domain C implementation by Darel Rex Finley.
//
//  * This function assumes it is called with valid parameters.
//
//  * Example calls:
//    quickSort(&myArray[0],5); // sorts elements 0, 1, 2, 3, and 4
//    quickSort(&myArray[3],5); // sorts elements 3, 4, 5, 6, and 7

void quickSort(int *arr, int elements) {

  #define  MAX_LEVELS  300

  int  piv, beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R, swap ;

  beg[0]=0; end[0]=elements;
  while (i>=0) {
    L=beg[i]; R=end[i]-1;
    if (L<R) {
      piv=arr[L];
      while (L<R) {
        while (arr[R]>=piv && L<R) R--; if (L<R) arr[L++]=arr[R];
        while (arr[L]<=piv && L<R) L++; if (L<R) arr[R--]=arr[L]; }
      arr[L]=piv; beg[i+1]=L+1; end[i+1]=end[i]; end[i++]=L;
      if (end[i]-beg[i]>end[i-1]-beg[i-1]) {
        swap=beg[i]; beg[i]=beg[i-1]; beg[i-1]=swap;
        swap=end[i]; end[i]=end[i-1]; end[i-1]=swap; }}
    else {
      i--; }}}


This might be slightly slower than the first one, but it will never fail because it always performs the smaller partition first.  Hence, the only way it could run into the limit is if the to-be-sorted array was at least 2MAX_LEVELS elements in size.  Since 2300 is greater than 1090, and there are only about 1080 fundamental particles in this universe from which a human-made computer can be built, no larger limit will ever be needed.

(Note: Someone reminded me that a typically 64-bit index variable can index only 264 items, not 2300. That’s true, but if you’re using a 64-bit computer, you’re probably not going to have an array of more than 264 elements anyway, even if each element was only one byte.)


Bugs?  Performance tests?  Questions?  I’d love to hear about it — please shoot me an e-mail.


Back to Tutorials.