Monday, June 19, 2006

Language of the week: Haskell

Tristan Sloughter and I decided that this summer we would try learning a language each week in order to expand our knowledge of different programming languages. Our intent is to learn the lanuage well enough in order to do a simple to moderate program in the language. While I won't be able to go into detail about each language in this blog, I would like to highlight some key points of the language.

For the first language, we chose Haskell. I had used Haskell before, but it had been a long time. I had also just learned Ocaml from CS440, which shares some similarities to Haskell. For our project we decided to write the LZW algorithm, which is a simple text compression algorithm.

Haskell is well suited for this task, as the algorithm is a straight sequential process on the text, and since Haskell is a functional language, it allows for an easy implementation. The code is as follows.

The first thing we need to do is create table. Basically this is just defining what table is.

> type LZWItem = (Int, String)
> type LZWTable = [LZWItem]

After that we just need to create a table instance

> lzwTable :: LZWTable
> lzwTable = []

For the main implementation I broke the code up into 4 parts.

1. encString: main function that processes the string and takes care of the table

> encString :: String -> LZWTable -> String -> [Int]
> encString [] [] = []
> encString (k:ks) (w:ws)
> | inTable lzwTable ((w:ws)++(k:[])) = encString ks lzwTable ((w:ws)++(k:[]))
> | otherwise = addCode lzwTable ((w:k:[])++ks) : encString ks lzwTable (k:[])

Here we simply take a string as input, and process it character by character according to the LZW algorithm. If it is in the table, we continue by getting one more character, otherwise we add the string to the table.

2. inTable: this is just a little helper functiong for encString to check if the string is in the table
> inTable :: LZWTable -> String -> Bool
> inTable table str = [ tCode | (tCode,tStr)<-table, tStr==str] /= []

3. addCode: adds code to the table for a new substring > addCode :: LZWTable -> String -> Int
> addCode table (x:xs)
> | table ++ [(length(table)+1,x:xs)] /= [] = getCode table (take (length (x:xs)-1) (x:xs))

We add the new code, while getting the code for the previous string per the LZW algorithm.

4. getCode: gets the code from the table for a previous string in the table

> getCode :: LZWTable -> String -> Int
> getCode table str = head [ tCode | (tCode,tStr)<-table, tStr==str]

This simply just looks up the code from the table.

As you can see the implementation is very straightforward, relatively easy to understand, and compact. Doing this in any other(non-functional) language like C would definitely take a lot more code.

I am very happy with Haskell, as everything seems very intuitive and logical. When programming with it, you have alot of "Duh, that is so easy and simple" moments that you don't get with other languages.

When comparing Haskell with the other functional language I knew before this, Ocaml, I would have to say Haskell is better. To me, Haskell just has a better feel to it and seems to flow a little better than Ocaml. Either language is great though compared to C/C++, Java, Perl, etc.

One other cool trick I would like to quickly add about Haskell is infinite lists. For example, here is how you generate a list of every integer:

> numsFrom n = n : numsFrom (n+1)

If you are looking for a good place to start, I recommend the book, "Haskell: The Craft of Functional Programming". It is great at going through the key concepts of Haskell and showing you great examples of usage that can help you solve almost any problem.

For the curious, our next language is going to be Erlang.