Big O Notation

From Wayne's Dusty Box of Words

The simplest definition I can give for Big-O notation is this:

Big-O notation is a relative representation of the complexity of an algorithm.

There are some important and deliberately chosen words in that sentence:

  • relative: you can only compare apples to apples. You can't compare an algorithm that does arithmetic multiplication to an algorithm that sorts a list of integers. But a comparison of two algorithms to do arithmetic operations (one multiplication, one addition) will tell you something meaningful.
  • representation: Big-O (in its simplest form) reduces the comparison between algorithms to a single variable. That variable is chosen based on observations or assumptions. For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering).
  • complexity: if it takes me one second to sort 10,000 elements, how long will it take me to sort one million? Complexity in this instance is a relative measure to something else.

It shows how the algorithm scales with respect to the input size or the dataset it's operating over.

For example, let's look at the Big-O of doing arithmetic. Take two numbers (123456 and 789012) and consider the 4 basic operations:

  • Addition
  • Subtraction
  • Multiplication
  • Division

Let's look at the algorithm to solve each.

The addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.

Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.

See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits n in the larger number. We call this O(n) or linear complexity.

Subtraction is similar (except you may need to borrow instead of carry).

Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 columns adds to get the end result too.

If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (1012) multiplications and two million adds.

As the algorithm scales with n-squared, this is O(n2) or quadratic complexity.

NOTE: We only care about the most significant portion of complexity.

The astute may have realized that we could have expressed the number of operations in multiplication as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for only 0.0002% of the total operations).

Big-O notation is about the Worst-case scenario of an algorithm.

As a practical matter, Big O can be used to determine:

  • Best Case: This is the best possible case for the algorithm. For example, take a binary search, the best case is that we find the target in one comparison. This is O(1) or constant complexity;
  • Expected Case: This is the typical case for the expected data set. For a binary search this is O(log n) or logarithmic complexity.
  • Worst Case: This is the worst the algorithm can do, for a binary search on an ordered data set this is also O(log n). If the data is unordered, the worst case is a grim O(n). That doesn't sound bad until you think about the size of the data you might search.

When the Big-O of an algorithm is discussed, we should assume we are talking about the worst case.

From best to worst, the classes of complexity are:

  • O(1): known as Constant complexity
  • O(log n): known as Logarithmic complexity
  • O(n): known as Linear complexity
  • O(n2): known as Quadratic complexity
  • O(n!) or factorial or Combinatorial complexity

This touches upon a closely related topic in complexity:

Any algorithm that has a complexity of O(na) is said to have polynomial complexity or is solvable in polynomial time. Algorithms of O(n), O(n2), etc. are all polynomial time. There are some that are so complex they cannot be solved in polynomial time. For example, the traveling salesman problem or factoring large prime numbers.