Day to day stuff

Tuesday, September 21, 2010

Why Functional Programming Matters - An exploration of functional Scala.

package nl.grons.whyfp
/**

A translation of Miranda examples in the paper Why Functional Programming Matters by John Hughes, 1984.

@author Erik van Oosten
*/

object WhyFp_Abstract {
/*

Scala is in the focus of attention because it allows a smooth migration from one of the most popular programming languages ever to a language that supports (among other things) a functional style of programming.

In this article I explore the functional aspects of Scala based on a very influential paper on functional programming: ‘Why Functional Programming matters’. To do so I will translate the Miranda examples of chapter 3 and 4 (next article) to Scala. To keep things pure, the standard Scala libraries are avoided.

I will conclude that Scala has sufficient features to live up to Hughes’ standards with the added bonus of being statically type safe.

Just like the paper I just assume you know a bit about functions, partial functions and partially applied functions. In addition I assume you know basic Scala syntax. However, as I am still a Scala newby myself, any comments that will improve this article are welcomed.

This article focusses on the Miranda -> Scala translation and will not repeat the message of the paper. So go ahead, and read the paper first.

All text in this article is properly commented, so the whole article can be copy pasted into your favourite text-editor.

*/
object Chapter3_GlueingFunctionsTogether {
import RichFunction2._
/*

The original article completely defines a list right there in just one line:

listof X ::= nil | cons X (listof X) (Miranda)

Scala is not so powerful and this mainly because we need to translate Miranda’s ‘|’ operator to a class hierarchy.

Lets define List more or less as the Scala library does it (as anonymous points out, it can be done much shorter).

/** The base type for our list. */
sealed abstract class List[+A] {
  def head: A
  def tail: List[A]
}

/** Cons represents an element and the rest of the list. */
case class Cons[A](hd: A, tl: List[A]) extends List[A] {
  override def head: A = hd
  override def tail: List[A] = tl
  override def toString = "Cons(" + hd + "," + tl + ")"
}

/** Nil represents the empty list. */
case object Nil extends List[Nothing] {
  override def head: Nothing = {
    throw new NoSuchElementException("head of empty list")
  }
  override def tail: List[Nothing] = {
    throw new NoSuchElementException("tail of empty list")
  }
  override def toString = "Nil"
}

Note that Cons is a case class, therefore we can use Cons(x,y) as synonym for the constructor and also as extractor during pattern matching.

We can now use the list as follows:

[]         => Nil
[1]        => Cons(1, Nil)
[1,2,3]    => Cons(1, Cons(2, Cons(3, Nil)))

As in the original article, what follows is a number of definitions of ‘sum’. I name each definition differently so that this whole article can be compiled as is.

My first approach mimics the Miranda approach with 2 partial functions. Scala does not automatically combine partial functions se we’ll need to do ourselves.

// Miranda:
// sum nil = 0
// sum (cons num list) = num + sum list
//
val sum_nil: PartialFunction[List[Int], Int] = { case Nil => 0 }
val sum_cons: PartialFunction[List[Int], Int] =
       { case Cons(num, list) => num + sum_combinedpartials(list) }
def sum_combinedpartials = sum_nil orElse sum_cons

The second, more idiomatic Scala, version wraps this into a single method:

def sum_idiomatic(li: List[Int]): Int = li match {
  case Nil => 0
  case Cons(num, list) => num + sum_idiomatic(list)
}

Note that both approaches use pattern matching, just like the Miranda original.

The following definitions of ‘sum’ are more interesting. They are very close to the original example. Note that this actually defines an immutable reference (a val) to a function. In case you are wondering, List[Int] => Int is the type of the function: it accepts one argument, List[Int], and has Int as return type.

// Miranda: sum = reduce add 0
val sum: List[Int] => Int = reduce(add, 0)

Note that nowadays most languages instead of ‘reduce’, use the term ‘foldr’ (fold-right).

Scala has type inference, so if we let the compiler derive the return type, its even closer. Note that we need to tuck an underscore to the end to let the compiler know we really want a partially applied function.

val sum_noTypes = reduce(add, 0) _

And of course we need a definition for ‘add’:

// Miranda: add x y = x + y
def add(x: Int, y: Int): Int = x + y

For function ‘reduce’, I again use pattern matching just like the Miranda original. Notice that ‘reduce’ defines 2 parameter lists. One with parameters f and x, and the second with parameter l. This allows for easy creation of partially applied functions. And this is exactly what happened in the definitions of ‘sum’ and ‘sum_noTypes’ above.

// Miranda
// (reduce f x) nil = x
// (reduce f x) (cons a l) = f a ((reduce f x) l)
//
/**
 * Right fold a list with a function f and a constant x.
 *
 * @param f a function to apply
 * @param x the result for an empty list
 * @param l the list to fold
 * @param [A] the type of list elements
 * @param [B] the result type
 * @return the result of the expression where in l,
 *   each Cons is replaced by f, and Nil by x
 */
def reduce[A, B](f: (A, B) => B, x: B)(l: List[A]): B = l match {
  case Nil => x
  case Cons(head, tail) => f(head, reduce(f, x)(tail))
}

More examples:

// Miranda: product = reduce multiply 1
def product: List[Int] => Int = reduce(multiply, 1)

Just for fun I define ‘multiply’ a little bit different then ‘add’. Where ‘add’ is a method (which can be used as a function), ‘multiply’ is directly defined as a function. Each underscore declares an anonymous parameter. To avoid the ugly (_: Int) syntax, the type of these anonymous parameters are derived form the return type of ‘multiply’.

val multiply: (Int, Int) => Int = _ * _

We can also inline these functions. If you do so the compiler still needs some hints to infer the types of the anonymous parameters. In the following definitions of ‘anytrue’ and ‘alltrue’ I solve that by adding the generic types of ‘reduce’ (the [Boolean,Boolean] part).

With ‘alltrue’ the return type is inferred by the compiler (again, this requires an underscore to confirm we want partial application of ‘reduce’). In the definition of ‘alltrue’, the return type is defined and therefore the underscore is not needed.

Personally I prefer the explicit return type as the function is too complex to just see what it should be. My rule is omit the return type only when it is blindingly obvious (as in ‘add’ above).

// Miranda
// anytrue = reduce or false
// alltrue = reduce and true
//
def anytrue: List[Boolean] => Boolean = reduce[Boolean, Boolean](_ || _, false)
def alltrue                           = reduce[Boolean, Boolean](_ && _, true) _

Lets also explore how ‘reduce’ works and calculate the sum of [1,2,3]. Note how we replace Cons with the given function ‘add’ and Nil with the given constant ‘0’.

Starting point:       Cons(1,     Cons(2,     Cons(3, Nil)))
Reduce applied:        add(1,      add(2,      add(3,  0 ))) == 6
And for multiply: multiply(1, multiply(2, multiply(3,  1 ))) == 6

Here is my translation of ‘append’.

// Miranda: append a b = reduce cons b a
def append[A](a: List[A], b: List[A]): List[A] = reduce[A, List[A]](Cons[A], b)(a)

Note that the first parameter we pass to ‘reduce’ is actually the Cons object (with generic type parameter [A]). This is possible because anything with an ‘apply’ method is recognized as a Scala function.

The generic type ‘[A]’ on Cons is essential; without it the compiler generates nothing but misleading error messages. It took me many hours to figure this one out.

If you are new to this complex kind of type declarations (like I am), you’ll find that careful reading is in order. I am afraid this is just something you need to practice. Also, it helps greatly when you wrote this kind of stuff yourself.

Lets skip ahead to ‘doubleall’.

// Miranda:
//        doubleall = reduce doubleandcons nil
// where  doubleandcons num list = cons (2*num) list
def doubleall_1: List[Int] => List[Int] = {
  def doubleandcons(num: Int, list: List[Int]) = Cons(2 * num, list)
  reduce(doubleandcons, Nil)
}

As you can see Miranda’s where keyword nicely translates to nested methods.

Next version:

// Miranda:
//        doubleandcons = fandcons double
// where  double n = 2*n
//        fandcons f el list = cons (f el) list
def doubleandcons: (Int, List[Int]) => List[Int] = {
  def double(n: Int) = 2 * n
  def fandcons[A, B](f: A => B)(el: A, list: List[B]) = Cons(f(el), list)
  fandcons(double)
}

So far, I had little problem with translating this to Scala. However, the next line needed quite a bit more thought:

fandcons f = cons . f

Scala (Miranda as well) only defines the ‘compose’ method on functions of arity 1. For Miranda this is no problem as all functions written down with multiple arguments are actually syntactic sugar for functions with a single argument that produce further functions with 1 argument. So the type of ‘add’ in

add x y = x + y

is not (Int, Int) => Int as it would be in Scala, but Int => Int => Int.

In our Scala code Cons.apply has 2 arguments and thus arity 2. To be precise, the type of Cons[B] _ is (B, List[B]) => List[B]. (Note in Scala we write Cons[B] _ to indicate we need it as a partially applied function).

In order to apply ‘compose’ we first need to convert Cons[B].apply to something of type B => List[B] => List[B] (a unary function that accepts a B, and gives another unary function that accepts a List[B], and gives a List[B]). This is called currying and is performed with the ‘curried’ function.

The result of currying is a unary function that we can compose with ‘f’. The next step is to uncurry back to a function of arity 2. This is done with ‘uncurried’. Not lets put this in code:

// Miranda: fandcons f = cons . f
def fandcons[A, B](f: A => B): (A, List[B]) => List[B] =
       Function.uncurried( (Cons[B] _).curried.compose(f) )

However, this is very a unwieldy syntax and I am going to need it a lot in the upcoming examples. Therefore, I’ll pimp the Scala library to include ‘compose’ also on functions of arity 2 (see RichFunction2 below).
We can now condense the code to:

// Miranda: fandcons f = cons . f
def fandcons_short[A, B](f: A => B): (A, List[B]) => List[B] =
       (Cons[B] _).compose(f)

And then we try it out with some arguments to see it is correct:

fandcons(f, el)(list) = Cons.compose(f)(el)(list)
                      = Cons(f(el), list)

And finally (no longer defining ‘double’ as a nested function):

// Miranda: doubleall = reduce (cons . double) nil
def double(n: Int) = 2 * n
def doubleall: List[Int] => List[Int] =
       reduce((Cons[Int] _).compose(double), Nil) _

The next step is to extract ‘map’:

// Miranda:
// doubleall = map double
// map f = reduce (cons . f) nil
def doubleall_map: List[Int] => List[Int] = map(double)
def map[A, B](f: A => B): List[A] => List[B] = 
      reduce((Cons[B] _).compose(f), Nil) _

Map could for example be used to calculate the sum in a matrix, a list of list of int.

// Miranda: summatrix = sum . map sum
val summatrix: List[List[Int]] => Int = sum.compose(map(sum))

Here is a fragment of the scala console in which I try out summatrix:

scala> import nl.grons.whyfp.Chapter3_GlueingFunctionsTogether._
scala> val a: List[List[Int]] = Cons(Nil, Cons(Cons(1, Cons(2, Nil)), Cons(Cons(4, Nil), Nil)))
a: nl.grons.whyfp.Chapter3_GlueingFunctionsTogether.List[nl.grons.whyfp.Chapter3_GlueingFunctionsTogether.List[Int]] = Cons(Nil,Cons(Cons(1,Cons(2,Nil)),Cons(Cons(4,Nil),Nil)))

scala> summatrix(a)
res0: Int = 7

Lets move on and create a tree.

//Miranda: treeof X ::= node X (listof (treeof X))
sealed abstract class Tree[+A] {
  def label: A
  def children: List[Tree[A]]
}

case class Node[A](l: A, c: List[Tree[A]]) extends Tree[A] {
  override def label: A = l
  override def children: List[Tree[A]] = c
}

The tree 1 o / \ / \ 2 o o 3 | | o 4 in Scala would look like:

Node(1,
     Cons( Node(2, Nil),
           Cons(Node(3,
                Cons(Node(4,
                     Nil),
                Nil)),
           Nil)))

Here is my version of ‘redtree’:

// Miranda:
// redtree f g a (node label subtrees) =
//         f label (redtree' f g a subtrees)
// redtree' f g a (cons subtree rest) =
//         g (redtree f g a subtree) (redtree' f g a rest)
// redtree' f g a nil = a
/**
 * Reduce a tree with functions f and g and constant a.
 * @param f a function to apply to a node label and
 *   the result of g on the node's subtree
 * @param g a function to apply to a subtree and
 *   the result of g on the following sibling subtrees
 * @param a the result for an empty tree
 * @param tree the tree to reduce
 * @param [F] the return type of f and redtree itself
 * @param [G] the return type of g
 * @param [A] the type of the tree labels
 * @return the result of the expression where in tree,
 *   each Node is replaced by f, each Cons by g, and Nil by x
 */
def redtree[F, G, A](f: (A, G) => F, g: (F, G) => G, a: G)(tree: Tree[A]): F = {
  def redchildren(children: List[Tree[A]]): G = children match {
    case Cons(subtree, rest) => g(redtree(f, g, a)(subtree), redchildren(rest))
    case Nil => a
  }

  f(tree.label, redchildren(tree.children))
}

Notice how Scala’s nested functions are more elegant then Miranda’s top level function ‘redtree' ’ (renamed to ‘redchildren’ here); in Scala the arguments of ‘redtree’ are not passed to ‘redchildren’ as the nested function can access them directly.

Lets use redtree to sum all integer labels in a tree of type Tree[Int]:

// Miranda: sumtree = add add 0
val sumtree: Tree[Int] => Int = redtree(add, add, 0)

Next: collecting all the labels in a list:

// Miranda: labels = redtree cons append nil
def labels[A]: Tree[A] => List[A] =
     redtree(Cons[A], append[A], Nil)

And the last example: create a new tree by applying a function to each label:

// Miranda: maptree f = redtree (node . f) cons nil
def maptree[A, B](f: A => B): Tree[A] => Tree[B] =
     redtree((Node[B] _).compose(f), Cons[Tree[B]], Nil)

} */

Parting thoughts

Although it took quite some time to write the code and then translate this to my 100th article, I had great fun doing so. In addition, I learned quite a bit on writing more complex Scala code.

The code for chapter 4 has already been translated, so expect another article soon.

P.S. RichFunction2

As was written above, Scala does not define ‘compose’ on Function2 (the class representing functions with 2 parameters), only on Function1 ‘compose’ is available. To remedy this, we’ll define it ourselves with the pimp-my-library pattern.

class RichFunction2[T1, T2, R](f: Function2[T1, T2, R]) {
 /**
  * (f compose g)(x, y) == f(g(x), y)
  */
 def compose[A](g: (A) => T1): (A, T2) => R = {
   (x: A, y: T2) => f(g(x), y)
 }
}
object RichFunction2 {
  // Note: complete return type is required here because the method
  // is used before it is declared.
  implicit def richFunction2[T1, T2, R](f: Function2[T1, T2, R]):
       RichFunction2[T1, T2, R] = new RichFunction2(f)
}

Thursday, August 12, 2010

Open source - why bother with anything else?

Since I work in a fine small company where we are breathing open source for at least a decade, it is sometimes weird to be confronted again by open source adversaries or agonists. For example, a colleague wrote a fine technical design based on Mule. All of a sudden we're asked to compare this to BEA AquaLogic and see whether we could implement the project with that. Now this is probably possible, and AquaLogic is probably a fine product family, but why bother?

Since there was already a Mule prototype, I found it a cumbersome idea. To get up to speed with AquaLogic you first have to find out what products in the AquaLogic family you need, and of course you only need a tiny bit of most of them. Secondly you have to go into a trajectory to get the software, including development licenses. Somehow money is not always a problem, but the time to get the products and start working always is, and the deadline won't move. Did I already mention I really hate bureaucracy?

Starting with open source often just takes 5 lines in a pom.xml and a few minutes of download time.

Okay, well suppose this was all taken care of and you are happily underway with development. You then run into a problem. Yes, you will, this is no different from open source. Now lets see how I typically deal with problems with using open source and see how this applies to closed source software.

Finding a solution to a problem with an open source product
1) (Re-)read the documentation
The first step is always to read the documentation. With the source jars attached in your favorite IDE, the javadoc is one key-press away.

2) Debugging
In step 2 we'll do some debugging. Again, with the source jars attached, tracing through your own code is as easy as tracing the open source code. I may not understand all the code, but I know what I need and I can read the JavaDoc of code encountered underway.
More often then not tracing leads to a deeper understanding of the used products, and I frequently find things that are useful for other parts of the project. The deeper understanding helps to form a solution. This can be changing your code, or patching the open source product. Otherwise it helps you formulate a more precise problem statement for the next steps.

3) Read more documentation
As I have now seen the code, I can search the available documentation more efficiently. So I do this first before going into the next step. Documentation in this phase can be anything, from manuals to blogs and forums.

4) Post questions
The last step is to post a question on a forum or mailing list. If you get this far, you are either lazy, you just missed something, or you found out that the library has a bug. The results of this step are not always satisfactory; it really depends on the community around the product. At least you can always change the open source product yourself.

Finding a solution to a problem with a closed source product

1) (Re-)read the documentation
Again, the first step is to read the documentation. However, javadoc is not always available. Rarely is it available in the form of a source jar.

2) Debugging
Oops, we can only trace our own code. Perhaps you can use de-compilation. Note that this might actually be illegal in your country (not in The Netherlands luckily). Secondly, the source might be obfuscated. And of course, I am not even talking about products that run as a complete separate program.

3) Read more documentation
As it is unlikely that debugging gave us more insight, we skip this step.

4) Post questions
In rare cases there are user communities around closed source. These are very valuable! However, usually you just have to ask the manufacturer. By lack of insight, the question won't be as detailed as with open source. And then we rely on the manufacturer. Some react quick and accurate, some don't even give you the the ability to ask questions. More often then not it costs lots of money and time. And of course meanwhile you will have to code workarounds yourself.

Conclusions

Despite the title I am not at all against closed and commercial software. Its just that as a programmer I find it hardly ever worth the bother.

Thursday, April 1, 2010

Another Wicket course coming up

I'll be doing the public Wicket introduction course from jweekend again on May 27 and 28. For more information see the link above or drop me a note.

Wicket root mounts

Just in case you missed it: I recently wrote an article that shows how to mount pages on the root in Wicket. As a lot of time went into creation the example application I posted the article on my employers blog.

Friday, October 30, 2009

Backup home directory to USB harddisk

As I am keen to install Ubuntu 9.10 on my work laptop, its time to do an extra backup of my home directory. My mp3 player has about 30 Gb free so I'll use that.

First attempt:

cp -R /home/erik /media/H300/Backup/

You wont believe how slow this is! All those little pesky files, we'll need to aggregate them.

Second attempt:

tar -cjf /media/H300/Backup/erik.tar.bz2 /home/erik

Waiting ... waiting ... Oops, that fails! The FAT file system on the mp3 player only supports files up to 2Gb.

After a long but fruitless search in the documentation of tar (and zip) for multi-volume archives options, I suddenly thought of the simplest thing that could work:

tar -cj /home/erik | \
split -d -b 1G - /media/H300/Backup/erik.tar.bz2.part

This creates a bunch of 1Gbyte files ending in part00, part01, etc.

You've got to love unix!

Wednesday, September 16, 2009

Wicket do's and dont's

Just published an article on my employer's blog: Wicket do's and dont's.

Friday, June 12, 2009

Extending my e-mail stack with Roundcube

I don't trust anyone with my most precious data: e-mail. That is why I run my own e-mail server. The server runs Ubuntu, Postfix, Dovecot and several tools for spam interception. I access my e-mail from several machines through the IMAP protocol (with TLS). Though any good IMAP client would do it is always Thunderbird (yes, even on my Mac).

It is however not always feasible to have Thunderbird around. Time to add an IMAP web client! Candidates are IMP, Squirrelmail and Roundcube. All of these projects continuously release security updates. So unless you go for the next (beta) Ubuntu release, the packaged version is almost always a few versions behind.

In the past I have already used IMP and Squirrelmail. IMP has a nice UI but was difficult to install. Squirrelmail looks really old, but has a huge amount of nice plugins.

So it became Roundcube this time. Roundcube is not out of beta for that long. But it does look really slick, almost as if it is a desktop app. I did go with the Ubuntu package after all. Lacking more documentation I had to figure out for myself that I needed to use /etc/roundcube/apache.conf as the basis for a virtual host file in the /etc/apache2/sites-available directory. After some smaller configuration tweaks in /etc/roundcube/main.inc.php I had the application as I wanted it.

One more thing: Roundcube (like IMP and Squirrelmail) continuously open and close new IMAP connections. This makes the site amazingly slow. The fix: package imapproxy. This package starts an IMAP server that caches IMAP connections to another IMAP server. Some small config changes, and Roundcube was a lot quicker.