Sunday, September 27, 2009

Big-O Notation

Big-O Notation measures the complexity of an algorithm or order of magnitude of the number of operations required to perform a function. In other words, how the number of processing steps increase as the number of items being processed increases. Processing may not increase in a constant manner. For instance if processing one item takes 2 seconds, processing 100 item does not necessarily take 200 seconds (2 * 100).

Big-O notation is a type of language or notation or terminology to discuss the the reality which is - algorithms need to process efficiently. Without knowing Big-O notation, an experienced programmer can look at an algorithm and understand that a step that can be moved to process outside of a loop will reduce the processing steps by the number of loops the algorithm has to perform less one. Instead of processing once for each item in the list, it can process that step only one time. For me, this underlying understanding of processing steps is more important than reciting Big-O Notation, but we must follow industry standards when applying for jobs. We need a way to communicate about complexity of an algorithm with other programmers. So I conform.

Common Big-O notation orders:

O(1) represents function that runs in constant time
The time to run an algorithm doesn't change based if the number of items being processed changes. Whether running 1 or 1000 items the time remains constant. This is rare. Looking up an element in an array usually takes the same amount of time no matter how many items are in the array and would be said to run in constant time.

O(N) represents function that runs in linear time
Operations to run directly proportional to number of items processed. For instance if it take 3 minutes to process 1 item it will take 3 X 10 = 30 minutes to process 10 items.

O(N2) represents a function that runs in quadratic time.
The equation for quadratic time is (N2 - N) / 2. Or in other words 0+1+2+...+(N-1). In Big-O notation constants are dropped so we have N2 - N. As the number of items increases, subtracting N from the result becomes a negligible difference so we can skip that and end up with N2. So if you have 2 items a function that runs in O(N2) roughly takes 4 processing steps and with 10 items takes 100 processing steps. An example would be inserting items being processed into an array in the first position and having to move every single item of the array each time to insert the new item. Functions running in quadratic time are typically not acceptable for interactive applications.

O(log N) and O(N log N) represent a functions that runs in logarithmic time.
The running time of an algorithm increases with the log of the number of items being processed. These generally mean that the algorithm deals with a data set that is partitioned into small groups of data as it is processed, like a balanced binary tree.

For instance if your asked to find the number I'm thinking of out of 100 you could ask, is it greater than or less than 50. I say greater. You say is it greater or less than 75. I say greater. You say is it greater or less than 87.5. I say greater...and continues until you get to the number I am thinking of which is 88. This is more efficient than saying, "Is it one? Is it two? Is it three?..." etc.

O(N!) represents a function that runs in factorial time.
Factorial of 5 would be 5 x 4 x 3 x 2 x 1. So if there were five items being processed and the algorithm runs in factorial time then it would require 120 processing steps. If an algorithm performs in factorial time, look for a better solution.

Related articles:
Determining Big O Notation
O (n log n)

No matter how much Big-O analysis you do, you'll still ultimately need to test your algorithms on large data sets and there are a number of factors which can affect performance outside a single algorithm in an application. Ultimately you need to test performance. As the following article states: "the only real way to know for sure is to actually try it with large data sets. There may be performance issues that are not taken into account by big-oh notation, eg, the effect on paging as virtual memory usage grows."

However during the design process, Big-O notation provides a way to talk about the complexity of an algorithm and how efficient it is expected to be in advance.

Big O Notation Notes

Thursday, September 10, 2009

REST vs. SOAP Web Services

I had a request to interview for a contract job using web services. I've recently worked with .NET Web services that were implemented using SOAP and done Java web services previously but wanted a refresher. I went to look for a SOAP web service on Google to test with and it seems like Google has switched entirely to REST. I never really thought about it but I guess REST is another type of web service. I just kind of associated web services with SOAP. Wikipedia has a pretty good definition of web services.

As it turns out a lot of the major web API players are switching to REST - which is good. When I used REST for a web site integrating with Amazon's IMDB database it was much simpler and cleaner to use. I had previously reviewed a technical book and tried to test various SOAP web service software components and spent hours getting any of them to work correctly. REST is so much easier, cleaner, simpler - I wondered why everyone wasn't using it. Seems like now they are.

This is kind of funny because I went to an interview after working on the Amazon project and one of the people I was interviewing with commented that he had never met anyone who used REST before "in real life".

Here's an article comparing REST vs. SOAP web services:

Rest vs. SOAP Web Services

Tuesday, September 01, 2009

Underscores in SQL Server Query

Well, learn something new every day. I've been using SQL Server for close to 15 years but I guess I never needed to query for an underscore before today. An underscore is a special character in SQL server.

You can use the _ wildcard character to represent any single character in a query. For instance if you want to search for any four letter word that starts with w and ends with at you could search for:

where field="w_at"

In order to query for an underscore in SQL Server you need to escape it. The escape character in SQL Server is square bracket []. So if you want to find a particular column with an undercore in it you can query using the like statement as follows:

where field LIKE '%[_]%'

If you want to query for anything with two underscores in it then you'd have to escape each underscore like this:

where field LIKE '%[_][_]%'