Ghosts in the Machine

Friday 6 December 2013

Sorting and Practicality

Sorting, and the practicality of such algorithms is a significant area of study in computer science. How do we sort large amounts of data in a reasonable amount of time? Attempts to answer this question have resulted in numerous algorithms being created, and perhaps the most popular 3 that are taught in lower level courses are Selection Sort, Quicksort and Merge Sort

Selection sort is the simplest of the trio. It is an in-place sort and takes the minimum element of the unsorted list, and adds it in place in the sorted list. It is simple, yet slow for larger lists, having O(n²) time complexity. This may not seem slow, however when working with thousands of items this can take minutes to sort the list, as compared to seconds with quicksort or even merge sort. I personally havent found much use for this particular algorithm as it still performs worse than insertion sort when used on its optimal input (smaller lists).

File:Selection sort animation.gif

Quicksort is the next simplest. It works recursively, taking whats known as a pivot element, and sorting items to the left and right of it depending on if they are less than or greater than the pivot. Its' average case complexity is O(nlogn) however there are cases where it can perform as O(n²). Overall I find quicksort to be the simplest, and the most useful of the sorting algorithms taught, however it too is outperformed by other algorithms.

File:Sorting quicksort anim.gif

Merge Sort is the final one we were fully taught. Like Quicksort, it is recursive and is a divide and conquer algorithm. It works by dividing the list continually into halves until the list is one element long, at which point it is sorted by default and is then merged together with the other halves. Like Quicksort, mergesort is also O(nlogn).

File:Merge sort animation2.gif

I feel it is worth noting that there is no single best sorting algorithm. All sorts are data dependent and context dependent. For small enough lists, selection and insertion sort can outperform the recursive algorithms like quicksort and mergesort. Also worth noting, time efficiency does not mean space efficiency. In situations where memory space is limited, some algorithms which may be faster in an academic environment lose their usefulness.

Thursday 5 December 2013

O, Omega, and Theta

Ever had that one program that takes absolutely forever to execute? Ever wondered if you can safely get a cup of coffee while you wait for it to finish, or if you can go sleep for a few hours and come back?

We finally have an answer to these pressing questions. I introduce to you, Big O, Big Θ and Big Ω. These three notations provide us with a good estimate of how slow (or fast!) our programs will run given a large enough input in the worst case.

Big O is used to describe an upper bound on the runtime. This can be extremely useful depending on how fast your runtime grows with the input. If you have a program that runs in O(n!) time for example, you're going to want to leave your program running overnight, possibly for days.

When I ran bogosort during Lab 8 it took 35 minutes to sort a 10 item list. Big O can help prevent that awkward situation where you want to leave it running, but you also want to be there to see it finish and you're not sure how long it will take.

Visually, and mathematically Big-O provides us with an upper bound by giving us a function, that when multiplied by a constant will bound the runtime from above, or inuitively, our runtime will grow no faster than g(n).

The other two boundaries can be even more useful than Big-O. Big Ω provides us with a lower bound on runtime. Rather than giving us a general idea on the maximum amount of time it will take, Big-Ω instead tells us a lower bound. Given an input of size n, some constant times g(n) will make it grow no faster than our runtime.

Finally, Big Θ provides us with perhaps the most useful boundary on the runtime. This tells us that the runtime of our algorithm grows in the same way as some function g(n), essentially bounding it from above and below.

These three notations are an incredibly useful tool under the right circumstances and are no doubt something every programmer should learn at some point. No matter how slow or fast your program is, its still nice to know whether it will finish in the blink of an eye, or whether it will finish long after the heat death of the universe.

A2 and the wider world (Part II)

The other connection between A2 and the larger concepts of computer science is found in a different, and perhaps better way of handling the matching of regexes.

If you've looked far enough ahead into computer science you will undoubtedly uncover a field known as automata theory. I won't go into too much detail on this, however we do cover it in later year courses. Essentially it is the study of what are called automata, or abstract machines. Having an understanding of automata and how they work provides a very nice solution to A2.

Regexes themselves are what are known as finite state machines. As they read in the input values, they will take on a certain state, and depending on what the input was and what the transition behaviour of the machine is, it will then proceed to another state.

Within the scope of A2 heres how it works:

We begin in our initial state, and look to the first edge for the initial character.

If that character is an a, we can move on to our accepting state then check the rest. if the next character is a b, we move into an intermediate state, then back to the accepting state if the following character is a c.

If any of these conditions fail we move into the rejecting state, and for our purposes return False.

In pseudocode:

state = initial
if state == initial:
if s[0] != 'a':
return false # rejecting state
elif s[0] == 'a':
state = final
if state == final:
if s[i] == 'b':
state = inter
if i == len(s) -1:
return True
if state == inter:
if s[i] == 'c':
state = final
else:
return False

As I said before, everything eventually ties together. A2 more so than most.

Tuesday 26 November 2013

Assignment 2 and the wider world (Part 1)

Everything in computer science ties together in some way, shape or form. Personally I learn best by being able to make connections between things, amd Assignment 2 is no exception.

There are various concepts that come up in assignment 2 - Regexes, Binary Trees, recursion, string manipulation - but that isn't the most interesting part for me. The most interesting part is how it relates to the wider world.

Assignment 2 introduces the concept of taking a set of rules - in this case a regex - that we can't directly work with, and turning them into something we can work with, in this case a tree.

This concept comes back later on when you look at the broader characteristics of programming languages and compilers. What we are essentially doing in assignment 2 is taking a grammar, and generating a parse tree, or a concrete syntax tree from it. This allows us then to check if a sentence or string matches the specified grammar.

It really is interesting how everything comes together at some point, and assignment 2 is yet another example of the importance of first year, introducing concepts that are the foundations of others.

Tuesday 19 November 2013

Encapsulation

Sometimes as a programmer, you don't want your fields and methods modified outside of their own class. Perhaps the best way to solve this, is through information hiding as we discussed in lecture, also known as encapsulation.

In Java, as I'm used to, this is done by declaring your fields with the private keyword. This prevents it from being seen and accessed from outside of the class. In Python we can do something similar by adding an underscore before the name, for example _size will protect it to some extent, however it is only protected by convention.

In order to gain access to these now private variables, we must write what are known as accessor and mutator methods. Accessors returning the value, and mutators changing the value.

The fun and interesting part of all this? Gaining access to these variables without accessors or mutators. There are ways of doing this, my personal favourite is with a technique known as introspection, or reflection.

Reflection allows you to examine and modify an object at runtime, many languages support this. Java for example has a reflection library available for importing and using. Python, as an interpreted language also supports reflection.

The powerful thing about reflection is that it allows you to examine classes, methods, interfaces, and fields at runtime without knowing their names before compilation. It also allows you to instantiate objects and invoke methods without necessarily knowing the names.

So, encapsulation has varying degrees of usefulness, all depending on who is using your code. It can stop many from modifying classes in unintended ways, but to the more knowledgeable programmer, there are always ways around encapsulation.

Sunday 27 October 2013

The Wonders of Stacks

Stack. A very simple and easy to understand data type, and a great example for the beginning programmer. It is a type that is useful for introducing the concept of Abstract Data Types to a beginner, however to the more experienced programmer, Stacks can be even more useful.

Consider the problem of a calculator. We should all know how to program at least a simple one that can carry out basic operations along with those in any standard math library, however when parentheses are introduced the problem becomes more challenging.

This is just one of many uses for a stack. It can be used to implement Djikstra's Railway Algorithm for a more complex solution (Operator Precedence), or it can be used in a simpler case to parse the contents of the parentheses.

Consider (1 + 5) * 2.

If we use a stack, we can push the tokens in the expression onto it until we meet a close parentheses ')', At this point we can now pop all of the items in the stack until we reach the open parentheses '('. Now we can simply evaluate the expression, push the result back onto the stack and continue on, allowing us to simplify our problem.

Wether you're an experienced programmer, or a beginner, a stack can be a useful thing indeed.

Monday 14 October 2013

Object oriented programming. a familiar technique to some, a foreign one to others. It depends on what language a person is used to. To those familiar to it, Object Oriented Programming (OOP) is a very natural, very easy way to solve problems. It allows the programmer to take a concept they're used to and represent it as classes, interfaces, and objects. Classes can have attributes (or fields as some know them) defining the characteristics, and methods, which define its' behaviours.

Perhaps the most useful application of this concept is in creating Abstract Data Types, or ADTs. These are objects we create and model within programming languages by representing a real world concept or object within the code. Consider a well known type, the Stack. A list where new objects can only be placed on top, and removed from the top. In this case we have taken a real world concept (A stack of items) and modelled it within a programming concept by using lists and built in functions.

What other objects can you think of that we can represent as ADTS?