Thursday, January 14, 2010

Quick sort

Hoore cisca discovered it in 1961.

Why quicksort?

  • Definitely a greatest hit algorithm 
  • Prevalent in practice
  • O(n log n) time "on average"
  • minimal extra memory needed (which gives it leverage over merge sort)
The sorting problem
Input: Array of numbers , unsorted. Eg.

Output : Same numbers sorted in some order, say increasing order. Eg.
Quicksort helps us solving this problem.

What is quick sort?
Quick Sort is a sorting algorithm that sorts data items into ascending or descending order, which comes under the category of comparison-based sorting.

This algorithm works as follows:

  1. Pick the pivot element of the array , say first element:
  2. Rearrange the array such that:
    -- Left of pivot has elements less than pivot
    -- Right of pivot has elements greater than pivot

    Note that elements left of pivot should not be sorted as whole, but just be less than the pivot.
  3. Now, recursively sort the two half's separately with step 1 and 2. 

So the key concept behind Quicksort is the partitioning around a pivot. Choose one of the elements in the array as a pivot. Then rearrange the array so that everything less than the pivot is to its left and everything greater is to its right. Note that this rearrangement does not mean the entire array has to be sorted after this one partition. However, the pivot does end up in the "right" spot. That is, the pivot will not be moved anymore because it is already in the spot where it will be in the sorted array.

2 cool facts about partitioning

  • Partitioning can be done in linear time without allocating any extra memory 
  • reduces the problem size - i.e. allows us to use divide and conquer approach

Pseudocode of quicksort
 QuickSort(Array a, length n)
    if n = 1 return
    p = ChoosePivot(a, n)
    Parition a around p
    QuickSort(first part)
    QuickSort(second part)


For the time being lets choose pivot as the first element.

Partitioning 
Partitioning using linear extra space
The easy to partition the array would be to allocate a temporary array using O(n) space. So, whatever element you find lower than the pivot, place it in start, and whatever you find more that pivot, just place it in end, after the complete scan of array, only 1 hole will remain, just place the pivot there and you are done.


In-place version of partitioning
This will still partition around the pivot in O(n) time. There are various versions of partitioning.

Version 1 
Assuming the pivot is the first element in the array, you traverse  the array and keep a partitioned part and an unpartitioned part. You use one pointer to traverse the array and another to point to the first element in the partitioned part larger than the pivot. When you find an element smaller than the pivot, you simply swap it with the pointer's element. When you are finished, swap the element one before the pointer with the pivot to finish partitioning.

Partition(a,left,right)
{
   p=a[left];
   i=left+1;
   for(j=l+1;j<right;j++)
   {
      if(a[j]<p)
         swap(a[j],a[i]);
   }
   i++;
   swap(a[left],a[i-1]);
}
Lets dry run this version
So, initially left=0, right =8, as they are the boundaries of the array. Now, i and j initially point to 1 and pivot is 40. Now, i denotes the boundary between the elements less than pivot and more than pivot. j denotes the elements that we have scanned so far.






Now, we see if a[j]<p, so we swap these 2 elements, and increment i, as now it is the boundary between the elements less than pivot.
So, here we compare 10 and 40, and as 10 is less than 40, we swap 20 and 10 i.e. a[i] and a[j] and increment i.




Now, i and j point to 20. We increment j. j = 3.
a[j] > 40, so we continue and increment j again.

j=4, a[j] > 40, so we increment j again.

j=5, a[j] > 40. so we increment j again.

j=6, a[j] = 7. This time, a[j]<40 1.="" 7="" 80="" a="" and="" by="" continued="" element="" i="4" increment="" is="" j="" nbsp="" now.="" once="" p="" pivot="" properly.="" so="" swap="" the="" this="" we="">

Version 2:


Partition(a,left,right)
{
   p=a[left];
   l=left+1;
   while (l < r) {
       while (l < right && a[l] < p) l++; 
       while (r > left && a[r] >= p) r--;
        if (l < r)   swap a[l],a[r]
   }
}



Dry running the version 2

Initial condition: left=0, right = 8, p = a[left] = 40, l = 1, r = 8




Now, executing the while loop - if l is less than r and a[l] is less than pivot, we just increment l.For example, a[1] i.e. 20 is less than 40, so we just increment l further.
Now, we have a[2] i.e. 10, which is still less than 40, so we increment further.
Now, we have 80, which is more than 40, hence we find a[l] > pivot, hence we come out of this loop.
Now, we start with 2nd while loop, checking if a[r] > pivot
as, a[8] or 100 is greater than pivot or 40, we decrement r by 1. Now we come out of this while loop as well as a[7] i.e 30 < 40 the pivot. Now, as l<r, we just swap the a[l] and a[r].





So, we finally have

Now, we again continue with outer while loop as l<r.

Implementation in cpp

Following is the implementation in cpp:
Partition (based on version 2):
int Partition(int *darr, int lb, int ub)
{
    int a;              /* key holder */
    int up, down;
    int temp;
    a = darr[lb];
    up = ub;
    down = lb;

 while (down < up)
 {
  while (darr[down] <= a)   /* scan the keys from left to right */
   down++;
  while (darr[up] > a)      /* scan the keys from right to left */
   up--;
  if (down < up)
  {
   temp = darr[down];          /* interchange records */
   darr[down] = darr[up];
   darr[up] = temp;
  }
 }

 darr[lb] = darr[up];                /* interchange records */
 darr[up] = a;

return up;
} 

Quicksort method based on partition:
void quick_sort(int *darr, int lb, int ub)
{
    /* temporary variable, used in swapping */
 int up;
 if (lb >= ub)
  return;

 up = Partition(darr,lb,ub);
 quick_sort(darr, lb, up-1);  /* recursive call - sort first subtable */
 quick_sort(darr, up+1, ub);  /* recursive call - sort second subtable */
}


Analysis of Quicksort
Another cool fact about QuickSort is that every two elements in an array will either get compared once or never. If you think about it, the only comparisons that ever occur are the ones between the pivot and every other element in [start, end]. Once these comparisons are over, there are two recursive calls to QuickSort, both of which do not include the pivot at all. Thus, that pivot can only be compared once at most. 

Finding the Optimal Pivot
So far, we have used the first element in the array as the pivot. What if the array were already sorted, or reverse sorted? Then partitioning would leave the pivot either in the same spot or in the back. Then we'd recursively call QuickSort on the (n-1) length subarray. We do this over and over until the array is sorted. This takes O(n2) time. This is worst case scenario of quick sort.

How to choose pivots?
What would be a better way? If we want to balance the two subarrays so that they are about equal in length, then we would want to find the median. Median of the array is element such that half of element are less than other half are more than that element. Thus, if we are lucky enough to be able to get the median as the pivot every time, then the algorithm would be much like MergeSort without the extra memory allocation, running in O(nlogn).

Big idea is to choose the pivots randomly. For every time we pass array of k elements, we will choose the pivot randomly from k elements with equally likely probability. So, every time we call the algorithm we get the different execution, even for same input.

As a side note, this is an instance of adding randomness to our algorithm. Randomness is actually really useful in many applications because it makes things easier, more elegant, and efficient, but more on this later.


What if finding the median is too hard? What other choices do we have. Well, it is established that if we are able to split the subarrays just 25%/75%, then we will still have O(nlogn) running time. If we had 100 elements, we just have to pick an element that is anywhere from 25th smallest to 75th smallest. This is half the total elements, so we have 50% chance of picking this pivot.


In conclusion, we have established that a randomized QuickSort has an average running time of O(nlogn), a blazingly fast algorithm without the extra space allocation.Also, for sorted array, it performs very poorly, with worst case scenario running time as O(n2).

0 comments:

Post a Comment