Introduction
Sometimes data we store or retrieve in an application can have little or no order. We may have to rearrange the data to correctly process it or efficiently use it. Over the years, computer scientists have created many sorting algorithms to organize data.
In this article we'll have a look at popular sorting algorithms, understand how they work and code them in Python. We'll also compare how quickly they sort items in a list.
For simplicity, algorithm implementations would be sorting lists of numbers in ascending order. Of course, you're free to adapt them to your needs
If you'd like to learn about a specific algorithm, you can jump to it here:
Bubble Sort
This simple sorting algorithm iterates over a list, comparing elements in pairs and swapping them until the larger elements "bubble up" to the end of the list, and the smaller elements stay at the "bottom".
Explanation
We begin by comparing the first two elements of the list. If the first element is larger than the second element, we swap them. If they are already in order we leave them as is. We then move to the next pair of elements, compare their values and swap as necessary. This process continues to the last pair of items in the list.
Upon reaching the end of the list, it repeats this process for every item. Though, this is highly inefficient. What if only a single swap needs to be made in the array? Why would we still iterate though it n^2 times, even though it's already sorted?
Obviously, to optimize the algorithm, we need to stop it when it's finished sorting.
How would we know that we're finished sorting? If the items were in order then we would not have to swap items. So, whenever we swap values we set a flag to True
to repeat sorting process. If no swaps occurred, the flag would remain False
and the algorithm would stop.
Implementation
With the optimization, we can implement the bubble sort in Python as follows:
def bubble_sort(nums):
# We set swapped to True so the loop looks runs at least once
swapped = True
while swapped:
swapped = False
for i in range(len(nums) - 1):
if nums[i] > nums[i + 1]:
# Swap the elements
nums[i], nums[i + 1] = nums[i + 1], nums[i]
# Set the flag to True so we'll loop again
swapped = True
# Verify it works
random_list_of_nums = [5, 2, 1, 8, 4]
bubble_sort(random_list_of_nums)
print(random_list_of_nums)
The algorithm runs in a while
loop, only breaking when no items are swapped. We set swapped
to True
in the beginning to ensure that the algorithm runs at least once.
Time Complexity
Bubble Sort's time complexity, in the worst case scenario (when the list is in reverse) is O(n^2).
Selection Sort
This algorithm segments the list into two parts: sorted and unsorted. We continuously remove the smallest element of the unsorted segment of the list and append it to the sorted segment.
Explanation
In practice, we don't need to create a new list for the sorted elements, what we do is treat the leftmost part of the list as the sorted segment. We then search the entire list for the smallest element, and swap it with the first element.
Now we know that the first element of the list is sorted, we get the smallest element of the remaining items and swap it with the second element. This iterates until the last item of the list is remaining element to be examined.
Implementation
def selection_sort(nums):
# This value of i corresponds to how many values were sorted
for i in range(len(nums)):
# We assume that the first item of the unsorted segment is the smallest
lowest_value_index = i
# This loop iterates over the unsorted items
for j in range(i + 1, len(nums)):
if nums[j] < nums[lowest_value_index]:
lowest_value_index = j
# Swap values of the lowest unsorted element with the first unsorted
# element
nums[i], nums[lowest_value_index] = nums[lowest_value_index], nums[i]
# Verify it works
random_list_of_nums = [12, 8, 3, 20, 11]
selection_sort(random_list_of_nums)
print(random_list_of_nums)
We see that as i
increases, we need to need to check less items.
Time Complexity
Selection Sort's time complexity on average is O(n^2).
Insertion Sort
Like Selection Sort, this algorithm segments the list into sorted and unsorted parts. It iterates over the unsorted segment, and inserts the element being viewed into the correct position of the sorted list.
Explanation
We assume that the first element of the list is sorted. We then go to the next element, let's call it x
. If x
is larger than the first element we leave as is. If x
is smaller, we copy the value of the first element to the second position and then set the first element to x
.
As we go to the other elements of the unsorted segment, we continuously move larger elements in the sorted segment up the list until we encounter an element smaller than x
or reach the end of the sorted segment, and then place x
in it's correct position.
Implementation
def insertion_sort(nums):
# Start on the second element as we assume the first element is sorted
for i in range(1, len(nums)):
item_to_insert = nums[i]
# And keep a reference of the index of the previous element
j = i - 1
# Move all items of the sorted segment forward if they are larger than
# the item to insert
while j >= 0 and nums[j] > item_to_insert:
nums[j + 1] = nums[j]
j -= 1
# Insert the item
nums[j + 1] = item_to_insert
# Verify it works
random_list_of_nums = [9, 1, 15, 28, 6]
insertion_sort(random_list_of_nums)
print(random_list_of_nums)
Time Complexity
Insertion Sort's time complexity on average is O(n^2).
Heap Sort
This popular sorting algorithm, like the Insertion and Selection sorts, segments the list into sorted and unsorted parts. It converts the unsorted segment of the list to a Heap data structure, so that we can efficiently determine the largest element.
Explanation
We begin by transforming the list into a Max Heap - a Binary Tree where the biggest element is the root node. We then place that item to the end of the list. We then rebuild our Max Heap which now has one less value, placing the new largest value before the last item of the list.
We iterate this process of building the heap until all nodes are removed.
Implementation
We create an helper function heapify
to implement this algorithm:
def heapify(nums, heap_size, root_index):
# Assume the index of the largest element is the root index
largest = root_index
left_child = (2 * root_index) + 1
right_child = (2 * root_index) + 2
# If the left child of the root is a valid index, and the element is greater
# than the current largest element, then update the largest element
if left_child < heap_size and nums[left_child] > nums[largest]:
largest = left_child
# Do the same for the right child of the root
if right_child < heap_size and nums[right_child] > nums[largest]:
largest = right_child
# If the largest element is no longer the root element, swap them
if largest != root_index:
nums[root_index], nums[largest] = nums[largest], nums[root_index]
# Heapify the new root element to ensure it's the largest
heapify(nums, heap_size, largest)
def heap_sort(nums):
n = len(nums)
# Create a Max Heap from the list
# The 2nd argument of range means we stop at the element before -1 i.e.
# the first element of the list.
# The 3rd argument of range means we iterate backwards, reducing the count
# of i by 1
for i in range(n, -1, -1):
heapify(nums, n, i)
# Move the root of the max heap to the end of
for i in range(n - 1, 0, -1):
nums[i], nums[0] = nums[0], nums[i]
heapify(nums, i, 0)
# Verify it works
random_list_of_nums = [35, 12, 43, 8, 51]
heap_sort(random_list_of_nums)
print(random_list_of_nums)
Time Complexity
Heap Sort's time complexity on average is O(nlog(n)), which is already significantly faster than the previous algorithms.
Merge Sort
This divide and conquer algorithm splits a list in half, and keeps splitting the list by 2 until it only has singular elements.
Adjacent elements become sorted pairs, then sorted pairs are merged and sorted with other pairs as well. This process continues until we get a sorted list with all the elements of the unsorted input list.