Sometimes data we store or retrieve in an application can have little or no order. We may have to rearrange the data to correctly process it or efficiently use it. Over the years, computer scientists have created many sorting algorithms to organize data.
In this article we'll have a look at popular sorting algorithms, understand how they work and code them in Python. We'll also compare how quickly they sort items in a list.
For simplicity, algorithm implementations would be sorting lists of numbers in ascending order. Of course, you're free to adapt them to your needs
If you'd like to learn about a specific algorithm, you can jump to it here:
This simple sorting algorithm iterates over a list, comparing elements in pairs and swapping them until the larger elements "bubble up" to the end of the list, and the smaller elements stay at the "bottom".
We begin by comparing the first two elements of the list. If the first element is larger than the second element, we swap them. If they are already in order we leave them as is. We then move to the next pair of elements, compare their values and swap as necessary. This process continues to the last pair of items in the list.
Upon reaching the end of the list, it repeats this process for every item. Though, this is highly inefficient. What if only a single swap needs to be made in the array? Why would we still iterate though it n^2 times, even though it's already sorted?
Obviously, to optimize the algorithm, we need to stop it when it's finished sorting.
How would we know that we're finished sorting? If the items were in order then we would not have to swap items. So, whenever we swap values we set a flag to
True to repeat sorting process. If no swaps occurred, the flag would remain
False and the algorithm would stop.
With the optimization, we can implement the bubble sort in Python as follows:
def bubble_sort(nums): # We set swapped to True so the loop looks runs at least once swapped = True while swapped: swapped = False for i in range(len(nums) - 1): if nums[i] > nums[i + 1]: # Swap the elements nums[i], nums[i + 1] = nums[i + 1], nums[i] # Set the flag to True so we'll loop again swapped = True # Verify it works random_list_of_nums = [5, 2, 1, 8, 4] bubble_sort(random_list_of_nums) print(random_list_of_nums)
The algorithm runs in a
while loop, only breaking when no items are swapped. We set
True in the beginning to ensure that the algorithm runs at least once.
Bubble Sort's time complexity, in the worst case scenario (when the list is in reverse) is O(n^2).
This algorithm segments the list into two parts: sorted and unsorted. We continuously remove the smallest element of the unsorted segment of the list and append it to the sorted segment.
In practice, we don't need to create a new list for the sorted elements, what we do is treat the leftmost part of the list as the sorted segment. We then search the entire list for the smallest element, and swap it with the first element.
Now we know that the first element of the list is sorted, we get the smallest element of the remaining items and swap it with the second element. This iterates until the last item of the list is remaining element to be examined.
def selection_sort(nums): # This value of i corresponds to how many values were sorted for i in range(len(nums)): # We assume that the first item of the unsorted segment is the smallest lowest_value_index = i # This loop iterates over the unsorted items for j in range(i + 1, len(nums)): if nums[j] < nums[lowest_value_index]: lowest_value_index = j # Swap values of the lowest unsorted element with the first unsorted # element nums[i], nums[lowest_value_index] = nums[lowest_value_index], nums[i] # Verify it works random_list_of_nums = [12, 8, 3, 20, 11] selection_sort(random_list_of_nums) print(random_list_of_nums)
We see that as
i increases, we need to need to check less items.
Selection Sort's time complexity on average is O(n^2).
Like Selection Sort, this algorithm segments the list into sorted and unsorted parts. It iterates over the unsorted segment, and inserts the element being viewed into the correct position of the sorted list.
We assume that the first element of the list is sorted. We then go to the next element, let's call it
x is larger than the first element we leave as is. If
x is smaller, we copy the value of the first element to the second position and then set the first element to
As we go to the other elements of the unsorted segment, we continuously move larger elements in the sorted segment up the list until we encounter an element smaller than
x or reach the end of the sorted segment, and then place
x in it's correct position.
def insertion_sort(nums): # Start on the second element as we assume the first element is sorted for i in range(1, len(nums)): item_to_insert = nums[i] # And keep a reference of the index of the previous element j = i - 1 # Move all items of the sorted segment forward if they are larger than # the item to insert while j >= 0 and nums[j] > item_to_insert: nums[j + 1] = nums[j] j -= 1 # Insert the item nums[j + 1] = item_to_insert # Verify it works random_list_of_nums = [9, 1, 15, 28, 6] insertion_sort(random_list_of_nums) print(random_list_of_nums)
Insertion Sort's time complexity on average is O(n^2).
This popular sorting algorithm, like the Insertion and Selection sorts, segments the list into sorted and unsorted parts. It converts the unsorted segment of the list to a Heap data structure, so that we can efficiently determine the largest element.
We begin by transforming the list into a Max Heap - a Binary Tree where the biggest element is the root node. We then place that item to the end of the list. We then rebuild our Max Heap which now has one less value, placing the new largest value before the last item of the list.
We iterate this process of building the heap until all nodes are removed.
We create an helper function
heapify to implement this algorithm:
def heapify(nums, heap_size, root_index): # Assume the index of the largest element is the root index largest = root_index left_child = (2 * root_index) + 1 right_child = (2 * root_index) + 2 # If the left child of the root is a valid index, and the element is greater # than the current largest element, then update the largest element if left_child < heap_size and nums[left_child] > nums[largest]: largest = left_child # Do the same for the right child of the root if right_child < heap_size and nums[right_child] > nums[largest]: largest = right_child # If the largest element is no longer the root element, swap them if largest != root_index: nums[root_index], nums[largest] = nums[largest], nums[root_index] # Heapify the new root element to ensure it's the largest heapify(nums, heap_size, largest) def heap_sort(nums): n = len(nums) # Create a Max Heap from the list # The 2nd argument of range means we stop at the element before -1 i.e. # the first element of the list. # The 3rd argument of range means we iterate backwards, reducing the count # of i by 1 for i in range(n, -1, -1): heapify(nums, n, i) # Move the root of the max heap to the end of for i in range(n - 1, 0, -1): nums[i], nums = nums, nums[i] heapify(nums, i, 0) # Verify it works random_list_of_nums = [35, 12, 43, 8, 51] heap_sort(random_list_of_nums) print(random_list_of_nums)
Heap Sort's time complexity on average is O(nlog(n)), which is already significantly faster than the previous algorithms.
This divide and conquer algorithm splits a list in half, and keeps splitting the list by 2 until it only has singular elements.
Adjacent elements become sorted pairs, then sorted pairs are merged and sorted with other pairs as well. This process continues until we get a sorted list with all the elements of the unsorted input list.