Ben's CSC148 Experience: Week 11 of CSC148: Sorting and Efficiency

Here we are, the last of the mandatory SLOG posts and the last week of CSC148 as a whole. Today, I'll be sharing a little summary of what I have learnt so far about sorting and the efficiency of algorithms.

Sorting is something I became quite familiar with in CSC108. In both this course and the latter, sorting has commonly referred to organizing a list of n amount of integers from smallest to largest. For example, an unsorted list of n = 5 elements may looks like this: [2, 3, 4, 1, 5], which after running through the sorting algorithm would come out looking nice and sorted like this [1, 2, 3, 4, 5]. There's a plethora of sorting algorithms that have been created to handle this seemingly simple task and each one of them uses a different (or at least a variation on pre-existing) method. Bubble sort, for example, goes through each element of the list and compares it to the element to its right. If this element is smaller, both elements switch places. This algorithm keeps 'passing' over the list until every element is sorted. Another sorting algorithm, insertion sort, works by creating a sub-list starting from the beginning of the list and "inserting" each new value into its proper spot until the len(n) == len(sublist).

I remember asking myself the following question when first encountering sorting algorithms: if the task is always the same, why are there so many sorting algorithms? Well, put bluntly the task is not always the same. Every sorting algorithm has its strengths and weaknesses, some are just plain slow, and some are just plain fast. Timsort for example, as we saw in the lab last week is blazing fast in virtually every case, there's little wonder it is the default Python list sorting built-in. This is where efficiency comes in. On a small n, efficiency is of little concern. But when n increases to larger and larger values, which it tends to do in CS, especially when dealing with large amounts of data, efficiency becomes a critical concern. In CSC148, algorithm efficiency is commonly analysed on a worst-case basis. That is, the situation where the list is organized in a way that exposes any conspicuous inefficiencies in the algorithm. Big-oh notation is also used, which is the upper bound on the growth rate of the function ignoring any constants (i.e. 3n^2 is O(n^2). For example, a few weeks ago Danny was showing us Quicksort lecture. In the typical case it was quite fast, with O(n log n), but when run on an already sorted list, efficiency dropped to an abysmal O(n^2), which is the worst case for Quicksort. Any runtime near O(log n) is quite fast, n is being halved every time the algorithm makes a pass meaning even massive values of n are reduced quite quickly. On the other end of the spectrum, O(n) is quite a poor runtime, which also happens to be the runtime of bubble sort because, as the algorithm was described, it must look at every item in the list even in its best case. 1,000,000 items would require 1,000,000 times the amount of time as when n = 1--ouch!

The amount of time required for a function run one time is also an important factor in efficiency since this is the number that is multiplied by the big-oh runtime to calculate how long it will take a function to serve a specific purpose on a specific machine. To maximize efficiency in this regard, it is important to observe that there are no redundancies at any step within the code. To bring up another example from the lab, a decrease in runtime was noted when storing the length of n in a variable rather than calling len(n) multiple times within a loop. Given that both processor speed and time are valuable resources, efficiency is indeed an important aspect of algorithm design in programming.

Hope you enjoyed this post on sorting and efficiency!

Ben's CSC148 Experience

Monday, 31 March 2014

Week 11 of CSC148: Sorting and Efficiency

No comments:

Post a Comment