10 min read · Mar 2, 2017
Today I want to talk about sorting algorithms. A sorting algorithm is one that takes an unordered list and returns it ordered. Various orderings can be used but for this article we will focus on numeric ordering, ie 5,3,4,1,2 => 1,2,3,4,5. I’m going to present 3 different sorting algorithms, walk through how they work and compare their efficiency using Big-O notation — which I will briefly explain. All my examples will be in Python.
All of these algorithms fundamentally do the same thing and the practical difference between them is in their performance. We will describe their performance using Big-O notation. Big-O, along with Big-Omega and Big-Theta, describe the performance of an algorithm by estimating the number of operations required as the size of the input approaches infinity.
The notation format is O(g(n)), Ω(g(n)), and Θ(g(n)) respectively for Big O, Big Omega, and Big Theta. g(n) represents the complexity of algorithm f(n) and indicates to us how an algorithm’s complexity will scale as the input increases. An example of Big-O notation might look like O(n²). This tells us that the algorithm’s complexity scales in quadratic time, or in other words that the number of operations is at most n*n.
The exact computational complexity of an algorithm will always vary with the subtleties of its input—for example feeding a sorted list into an efficient sorting algorithm should take a lot less work to process then feeding in an unsorted list of the same size. With Big-whatever notation we make generalized statements about an algorithms performance like ‘this algorithm is at least this complex or this algorithm is at most that complex.’ To represent these different claims we use Big-O, Big-Omega, or Big-Theta. Big-O looks at the maximum number of operations that will be required, Big-Omega looks at the minimum, and Big-Theta looks at both boundaries—or in more technical terms Big-Theta shows how tightly bound the algorithm is.
With Big-O (and Omega/Theta) we are making approximations. We disregard all operations in the algorithm but the one that scales the least efficiently. We are able to do this because as the input increases and approaches infinitely large, the computationally complex parts of the algorithm will drastically outpace the less complex.
For example, imagine we had a function that loops through an array, checking if each element in the array matches a value and then returning the number of elements matching that value after the loop. The number of operations required to complete the for loop will increase linearly with the size of the input array (O(n)), and printing the sum will only needs to happen once (O(1)). Assuming an array with a size n, the loop is size n operations whereas the sum is always 1 operation. If we were assuming an input of n=1 then the sum operation would be very substantial. However, in Big-O we are assuming extremely large inputs approaching infinity. In that situation the sum is inconsequential in comparison to the n-sized loop and can be safely ignored.
Similarly, the if there were a portion of the algorithm that required n² operations — such as a double for loop where you iterate through the array and compare each element to all other elements — and then a final single O(n) size loop through the array, we would be able to safely disregard the O(n) portion of the algorithm for our analysis and call the algorithm O(n²). Even if the input was something small like n=1000, the O(n²) portion would 10,000 times more operations then the O(n) portion.
I’m going to focus on Big-O and ignore Omega and Theta — to be honest this stuff is all new to me and I’m most comfortable dealing with Big-O. I’ve given a couple examples of Big-O notation already but here are a few more common ones with simple examples:
def func(arr):
return arr[0]
This function returns the first element of the input array and will always take a single operation to complete, regardless of the size of the input.
def (arr, val):
for el in arr:
if el == val:
print el
return
The for loop in this function will always take one operation per element in the input array. O(n) functions like this have a linear increase in complexity.
def nlog(n):
if n == 1:
return
else:
nlog(n/2)
This function, while entirely useless, is a really simple demonstration of O(log(n)). This recursive function calls itself repeatedly on the input value divided by two until n = 1. This repeated halving of the input with each iteration is indicative of O(log(n)) functions. Initially their difficult increases at a fast pace but then quickly levels off as the input gets larger.
def containsDuplicate(arr):
for i in range(0, len(arr)):
for j in range(0, len(arr)):
if i != j:
if arr[i] == arr[j]:
return true
return false
The outer for loop has one operation per element in the input array, but every time the outer loop iterates, the inner one will complete a full loop the size of the input array. O(n²) algorithms like this have a complexity equal to the square of the input and are said to be in quadratic time.
def countFibonacci(num):
if num <= 1:
return num
else:
return countFibonacci(num-2) + countFibonacci(num-1)
This last example is a recursive function that returns the nth Fibonacci number. Without getting into the details, this is a recursive algorithm where the number of additions increases at an exponential rate which results in massive numbers of operations as n increases.
Here is a graph showing you how the number of operations increases relative to the size of input n.
Now that we have a basic understanding of Big-O, lets move on to the sorting algorithms.
This first algorithm uses a double for loop to iterate through the array while repeatedly comparing and swapping out of order adjacent elements. The inner for loop does all the swapping and the outer for loop ensures there is at least one pass through the inner loop per array element. When an out of place element is found it will get dragged along by the constant swapping until it reaches its correct location.
Here is a walk through of the algorithm:
Pass One:
[ 5, 3, 4, 1, 2 ] 5 > 3 Swap
[ 3, 5, 4, 1, 2 ] 5 > 4 Swap
[ 3, 4, 5, 1, 2 ] 5 > 1 Swap
[ 3, 4, 1, 5, 2 ] 5 > 2 Swap
[ 3, 4, 1, 2, 5 ]
Pass Two:
[ 3, 4, 1, 2, 5 ] 3 < 4 No Swap
[ 3, 4, 1, 2, 5 ] 4 > 1 Swap
[ 3, 1, 4, 2, 5 ] 4 > 2 Swap
[ 3, 1, 2, 4, 5 ] 4 < 5 No Swap
Pass Three:
[ 3, 1, 2, 4, 5 ] 3 > 1 Swap
[ 1, 3, 2, 4, 5 ] 3 > 2 Swap
[ 1, 2, 3, 4, 5 ] 3 < 4 No Swap
In this example we only needed 3 passes to complete the sorting. However, in a worst case scenario where the list was fully reversed we would need one pass per element. For each pass we then iterate through the full array once per pass to do the comparisons and swaps putting us in O(n²) time.
Here is the sample code:
def bubbleSort(arr):
# Loop through list once per element
for _ in range ( 0, len(arr)):
for i in range(0, len(arr)-1):
# compare and swap adjacent elements
if arr[i] > arr[i+1]:
arr[i], arr[i+1] = arr[i+1], arr[i]
return arr
This algorithm is an improvement over Bubble Sort where it makes only one swap per inversion. Like Bubble Sort, Selection Sort is O(n²). However, they only appear to be computationally equivalent due to Big-O ignoring lower complexity code in the algorithm. The real world affect of the reduction in swaps is an improvement in efficiency over Bubble Sort as you can see here:
In Bubble Sort, elements are dragged along from index to index through repeated swaps. In Selection Sort, we iterate through the list and find the largest out of place element and then make a single swap to place it in the correct location.
This is achieved by an outer loop counting down from last (largest) element of the array and an inner loop counting up with a conditional checking for inversions and swapping if j > i, where j is the inner iterator and i is the outer iterator.
Here is a walk through of Selection Sort. For this one I’ll keep track of the outer iterator i.
Pass One:
i
[ 5, 3, 4, 1, 2 ] 5 is the largest element in range 0-i so swap them
i
[ 2, 3, 4, 1, 5 ] 4 is the largest element in range 0-i so swap them
i
[ 2, 3, 1, 4, 5 ] 3 is the largest element in range 0-i so swap them
i
[ 2, 1, 3, 4, 5 ] 2 is the largest element in range 0-i so swap them
i
[ 1, 2, 3, 4, 5 ] done!
Here is the Selection Sort code:
def selSort(arr):
for i in range(len(arr)-1,0,-1):
# Temporary storage for index of highest element
maxIndex = 0
# Iterate from index 0 to index i inclusive
for j in range(0, i+1):
# Find highest value index
if arr[j] > arr[maxIndex]:
maxIndex = j # Store element to be swapped out
temp = arr[i]
# Swap the two elements
arr[i], arr[maxIndex] = arr[maxIndex], temp
return arr
This last algorithm is by far the most efficient of the three clocking in at O(n log(n)). It is a recursive algorithm which falls under the classification of Divide and Conquer Algorithms (D&C). It also has some interesting side effects which give it a number of uses outside of sorting numbers.
D&C is a basic design pattern for algorithms which follows three steps:
- Divide the input into smaller sub-problems that are smaller instances of the same type of problem.
- Conquer the sub-problems recursively
- Cleanup and recombine the results
In D&C we recursively break down a large problem into smaller and smaller sections until it becomes easy to solve and then build back up into the larger problem. In the case of Merge Sort we use two recursive calls to split the problem in half over and over until we get to the base case of one single element array.
In the diagram above we have made two recursive calls on the two halves of the array. This creates a tree structure with two branches at every level.
Once we reach the base case and start returning actual values back up the tree we can start sorting. 5 and 3 are returned to the left most recursive call on the 3rd level where they are compared and swapped. Then [5, 3] from the left recursive call and [ 4 ] from right recursive call are returned to 2nd level. On the 2nd level we check for out of order elements between the two returned arrays, sort them, and then return to the next level up. This continues until we have sorted the entire problem.
The trickiest part of implementing Merge Sort is handling the sorting between the two arrays returned by the pair of recursive calls. Comparing two one element arrays is simple. But how do we handle larger array comparisons?
It turns out its not complicated at all. Both arrays are already sorted and all we have to do is walk through both arrays at once and compare the lowest values yet to be added to the output arrays. Each array gets its own iterator which only increases when one of its elements is added to the output. If one array completes its iteration first then we know all the remaining elements in the other array have to be bigger and the entire other array is added to the output array.
Lets look at our example focusing on the last sorting step with arrays [3, 4, 5] and [1, 2]. i and j represent the iterators for the two arrays:
i j
[ 3, 4, 5 ] + [ 1, 2 ] = [ 1 ] j was smaller so it gets added i j
[ 3, 4, 5 ] + [ 1, 2 ] = [ 1, 2 ] j was smaller so it gets added i j
[ 3, 4, 5 ] + [ 1, 2 ] = [ 1, 2, 3, 4, 5] j completed its iteration so all of i's array are added to the output
Looking at the tree diagram of all the recursive calls you can see the characteristic halving of the input you see in O(log(n)) functions. However, in addition we are iterating through the arrays at each recursion level. n elements are iterated log(n) times or in other words O(n*log(n))—log linear on the graph included earlier. This shows Merge Sort to be substantially faster then the Bubble and Selection Sort.
Here is an example implementation of Merge Sort:
def mergeSort(arr):
if len(arr) == 1:
return arr
else:
a = arr[:len(arr)/2]
b = arr[len(arr)/2:]a = mergeSort(a)
b = mergeSort(b)
c = []i = 0
j = 0while i < len(a) and j < len(b):
if a[i] < b[j]:
c.append(a[i])
i = i + 1
else:
c.append(b[j])
j = j + 1c += a[i:]
c += b[j:]return c
With a few modifications you can use Merge Sort to count the number inversions between two arrays. This allows you to do a lot of interesting things beyond sorting numbers. One use is creating a simple recommendation engine like amazon’s “Customers Who Bought This Item Also Bought” feature. In a later post I’ll go into detail on this and present an example recommendation algorithm using merge sort.
Thats it for this post. I hope you found this helpful. If you have any corrections or additions, please post below.