Day to day stuff: August 2008

Wednesday, August 13, 2008

Java transaction boundary tricks

Controlling transactions is one of the fundamental things you must control well if your program is to be used by more then 1 user and has a database (as in about every web application). This article shows a little trick to get more out of transactions then most programs do. Since I am using Spring to demonstrate the principle, I will introduce Spring's transactions first. Feel free to skip that section if you know all about Spring transactions.

Using transaction from Spring
With Spring, controlling where a transaction is started can be as simple as adding the @Transactional annotation (and one extra line in the Spring configuration). The @Transactional is typically placed on the class that implements the interface of your service layer. For example you may have a MemberService:

public interface MemberService {
  List findMembers(MemberCriteria criteria);
  Member getMember(long id);
  Member updateMember(Member member);
}

with the following implementation:

@Transactional
public class DefaultMemberService implements MemberService {
  public List findMembers(MemberCriteria criteria) { ... }
  public Member getMember(long id) { ... }
  public Member updateMember(Member member) { ... }
}

When you let Spring instantiate DefaultMemberService as a bean, all of its public methods will be automatically proxied in such a way that the method is executed within a transaction.

Java transactions are bound to a thread (think ThreadLocal), the proxy can therefore join an existing transaction in the current call stack, and by default that is what happens. As a result you can safely call other transactional service methods from within your own transaction methods.

Introducing the CommandService
Here is a kind of weird service that I came up with a couple of years ago, named after the Command pattern, the CommandService:

@Transactional
public class DefaultCommandService implements CommandService {
  public void inTransaction(Runnable command) {
    command.run();
  }

  public <T> T inTransaction(Callable<T> command) {
    try {
      return command.call();
    } catch (Exception e) {
      throw new RuntimeException(
          "CommandService#inTransaction("
          + command.toString() + ")", e);
    }
  }
}

(Download link at bottom of article.)

As you can see it doesn't to anything except for executing the runnable or callable it is given. How can this be useful? Well, notice the @Transactional. Your block of code is now executed within a transaction!

Again, how can this be useful? Here are 3 use cases:

Use case 1: prevent fine grained service methods
Suppose you have a webpage that will update only one property of an entity, e.g. the email address of a member, and another page to update the password. Now to prevent concurrent update problems you want to get and update the member within the same transaction. You could make a new method on the service interface for each property you want to change. However, it is a good practice to keep interfaces concise. So instead we do something like this:

final long memberId = ...;
final String newEmail = ...;
Member freshMe = 
    commandService.inTransaction(new Callable<Member>() {
  public Member call() throws Exception {
    Member freshMe = memberService.getMember(memberId);
    freshMe.setEmail(email.getNewEmail());
    return memberService.updateMember(freshMe);
  }
});

The member entity is retrieved and updated in the same transaction. This way we only need 2 methods in the member service interface to safely update any property of a member.

Use case 2: bundle operations for performance reasons
I once wrote some code that would keep thousands of LDAP records in sync with a relational database. In my test environment, a dodgy laptop, it would take 50 minutes to do the initial import of 15,000 records. Each record was persisted to the database in its own transaction. After a small code change, I used the CommandService to persist the records in groups of 20. What do you think happened? The total time to import these 15000 records dropped from 50 to 2 minutes! Not bad, 25 times faster through 4 lines of extra code.

Careful positioning of transaction boundaries can have a dramatic effect on performance. CommandService can help you do that.

Use case 3: transactions in unusual places
Let us look at a little bit more detailed example of the PeriodicRetriever from my previous article (changes in italic).

public class CachingPostalCodeService {
  private final Object postalCacheLock = new Object();
  private List<PostalCode> postalCache;
  private PeriodicExecutor postalCacheReloader;
  private HibernatePostalCodeDao hibernatePostalCodeDao;
  
  public CachingPostalCodeService() {
    postalCacheReloader = new PeriodicExecutor(
          TimeUnit.MINUTES.toMillis(10), new Runnable() {
      public void run() {
        refreshPostalCache();
      }
      
      public String toString() {
        return "Postal code cache reloader";
      }
    });
  }
  
  public List<PostalCode> getPostalCodes() {
    postalCacheReloader.requestStart();
    synchronized (postalCacheLock) {
      return postalCache;
    }
  }
  
  private void refreshPostalCache() {
    List<PostalCode> newPostalCodes =
        Collections.unmodifiableList(
          hibernatePostalCodeDao.getAll()
        );
    synchronized (postalCacheLock) {
      postalCache = newPostalCodes;
    }
  }

  // ... setter for hibernatePostalCodeDao ...
}

If you try this, you may notice that the data is retrieved the first time, but not the second time! The exception message is clear too, something like "Hibernate requires a transaction". To understand this lets follow the two execution paths.

The first time getPostalCodes() is called, it is called through Spring's transaction proxy, so the method is executed within a transaction. So when the PeriodicRetriever is called, the runnable (which calls refreshPostalCache()) is immediately called within the same thread and thus also within the same transaction. No problem!

The second time getPostalCodes() is called (11 minutes later), PeriodicRetriever is called again but returns immediately. Meanwhile it starts a new thread which executes the runnable defined in CachingPostalCodeService's constructor. The important thing here is that the new thread does not join the transaction of its parent thread as transactions are bound to threads. The runnable then calls refreshPostalCache(). You may think you'll get a new transaction at that moment. However, a method call within the same class will never go through the transactional proxy. This is because the instance on which we call refreshPostalCache() is simply this, and not the proxy that was obtained from Spring. In other words: refreshPostalCache() is not executed within a transaction.

CommandService to the rescue (again, changes in italic):

public class CachingPostalCodeService {
  private final Object postalCacheLock = new Object();
  private List<PostalCode> postalCache;
  private PeriodicExecutor postalCacheReloader;
  private HibernatePostalCodeDao hibernatePostalCodeDao;
  private CommandService commandService;
  
  public CachingPostalCodeService() {
    postalCacheReloader = new PeriodicExecutor(
          TimeUnit.MINUTES.toMillis(10), new Runnable() {
      public void run() {
        commandService.inTransaction(new Runnable() {
          public void run() {
            refreshPostalCache();
          }
        });
      }

      public String toString() {
        return "Postal code cache reloader";
      }
    });
  }

  // ... same as above ...
  
  // ... setters for hibernatePostalCodeDao and commandService
}

Download
Feel free to use the complete version of CommandService in any way you see fit.

Conclusions
Watching your transaction boundaries can be very rewarding, both in code size as in performance. A command service, such as the one presented in this article can help you do that.

Wednesday, August 6, 2008

Asynchronous cache updates

This is the second article on Java concurrency. The first is Java thread control.

Many applications use more or less static reference data such as postal codes, exchange rates and externally stored text. Often this type of data can be safely cached in memory for performance reasons. To make sure that the cached data does not become stale for too long, it can be reloaded every 10 minutes, once a day or on any other schedule you might like.

Rather then creating custom code for every type of reference data, one can extract the hard concurrency code to a utility class. This article presents the PeriodicExecutor, a utility class I wrote years ago, and proved to be so useful and reliable that is has been traveling with me from project to project.

Here are the requirements I had in mind while writing PeriodicExecutor:

Support any data retrieval and any cache mechanism.
Refresh the cache in the background, do not slow down request handling.
No idle threads, only start a thread when one is needed.
Be resilient against errors, retry when they occur.
Support flexible reloading schedules.
Support initial loading of the memory cache.

Lets discuss these requirements and see how they are implemented.

1. Support any data retrieval and any cache mechanism.
Each type of data source needs its own retrieval code. E.g. JDBC code for database stuff and things like HttpClient for remote stuff. The cache implementation can also be different from case to case. For example in one case we just need to store a String, in another case we first transform the data to a Map. Therefore the utility class will only solve the the multi-threaded scheduling problem.

PeriodicExecutor solves this by letting the client code provide its reload task as a Runnable. The task is also responsible to make the results available when reloading is finished.

2. Refresh the cache in the background, do not slow down request handling.
When its time for a data reload, it does not make sense to delay normal request processing until the reload is done. Remember, we are caching for performance reasons! As a consequence PeriodicExecutor executes the client task in a separate thread.

PeriodicExecutor guarantees that the task is never run in parallel, but the client task must make sure that there are no concurrency problems the moment the cache is updated.

3. No idle threads, only start a thread when one is needed.
There are 2 approaches to thread handling. We can either start a thread upon initialization and keep that thread running all the time, or we can start a new thread each (short) time a reload is required. The first approach has the disadvantage that more resources are used then is necessary (though since Java 5 we could share an executor service with all reloaders). Inconvenient is that we need explicit code to shut down the thread (or executor service). Advantage of this approach is that a thread can schedule reloads precisely.

I chose the second approach: each time the cache is accessed by some code, that code must call method PeriodicExecutor#requestStart() to see if a reload is dictated by the schedule. If that is the case, the reload is started in a newly started background thread. A small disadvantage to this approach is that after a long period of inactivity, the first caller will trigger a reload, but until the reload is finished all cache users will see stale data.

4. Be resilient against errors, retry when they occur.
The reference data may be retrieved from an unreliable data source. When the reload fails, it should be tried again later. Normal processing can continue as the new background thread isolates it from exceptions. In addition, the reload task should only update the memory cache until after the reload succeeded.

PeriodicExecutor detects reload errors by catching exceptions thrown by the reload task. When an error occurred the next call to requestStart() will trigger a second reload attempt. After 2 failed attempts the next reload task is delayed for a couple of minutes to prevent trashing.

5. Support flexible reloading schedules.
PeriodicExecutor normally reloads with a fixed delay. However, it is easy to override this by providing your own next reload time. A future improvement would be to extract this code according to the strategy pattern.

6. Support initial loading of the memory cache.
Initially the memory cache is empty. As we are talking about reference data, it does not make sense to continue for most programs. PeriodicExecutor therefore also support synchronous loading and exception rethrowing. During a synchronous load the caller is blocked until the reload task is done. Before one task completed, the default is to run synchronously and rethrow exceptions from the reload task. Once one task completed, the default is to run asynchronously and to swallow exceptions (they are logged of course).

Example
Enough theory, how do you use it? Here is a simple example:

public class CachingPostalCodeService {
  private final Object postalCacheLock = new Object();
  private List<PostalCode> postalCache;
  private PeriodicExecutor postalCacheReloader;
  
  public CachingPostalCodeService() {
    postalCacheReloader = new PeriodicExecutor(
          TimeUnit.MINUTES.toMillis(10), new Runnable() {
      public void run() {
        refreshPostalCache();
      }
      
      public String toString() {
        return "Postal code cache reloader";
      }
    });
  }
  
  public List<PostalCode> getPostalCodes() {
    postalCacheReloader.requestStart();
    synchronized (postalCacheLock) {
      return postalCache;
    }
  }
  
  private void refreshPostalCache() {
    List<PostalCode> newPostalCodes = ....
    // Don't forget to make newPostalCodes unmodifiable.
    synchronized (postalCacheLock) {
      postalCache = newPostalCodes;
    }
  }
}

Notice that only a limited amount of code is needed to get all the discussed features. In this example a list of postal codes is cached in memory. The only user of the cache, method getPostalCodes, calls requestStart every time. Initially, when no data is in the cache, requestStart will block until the data has been loaded. After 10 minutes of use, the cache will be reloaded in the background. Note how the example synchronizes on postalCacheLock to prevent concurrency problems during the cache update.

Variations
Sometimes you only need to reload once a day, preferably early in the morning. The following example shows how one can override scheduled reload to 5 o'clock in the morning.

  public CachingPostalCodeService() {
    postalCacheReloader = new PeriodicExecutor(
          0, new Runnable() {
      // ...as above...
    }) {
      @Override
      protected long nextExecuteTime(long currentExecute) {
        return getNextTime(5);
      }
    };
  }

  /**
   * @param hour the requested hour
   * @return the next time it is the given hour
   *   in the JVM default time zone
   */
  private static long getNextTime(int hour) {
    Calendar cal = Calendar.getInstance();
    cal.set(Calendar.HOUR_OF_DAY, hour);
    cal.set(Calendar.MINUTE, 0);
    cal.set(Calendar.SECOND, 0);
    cal.set(Calendar.MILLISECOND, 0);
    if (cal.getTimeInMillis() < System.currentTimeMillis()) {
      cal.add(Calendar.DAY_OF_MONTH, 1);
    }
    return cal.getTimeInMillis();
  }

If you need more control on when to execute synchronously or not, take a look at method PeriodicExecutor#requestStart(boolean).

Download
Download PeriodicExecutor. PeriodicExecutor can also be used with Java 1.4 (and perhaps 1.3 as well).

Other options
Probably the most complete FOS library for task scheduling is Quartz.

Enjoy!

Saturday, August 2, 2008

Java thread control

As promised in my book review of Effective java 2nd edition, here is an article that shows a neat trick to control threads.

The recommended way to stop a thread according to the book is to use a volatile boolean field as a flag to properly stop a running thread:

public class StopThread {
  private static volatile boolean stopRequested;
  
  public StopThread() {
    Thread t = new Thread(new Runnable() {
      public void run() {
        int i = 0;
        while (!stopRequested) i++;
      }
    });
    t.start();

    TimeUnit.SECONDS.sleep(1);
    stopRequested = true;
  }
}

(See the book to understand why the volatile keyword is essential.) This is all nice, but this thread can only be started and stopped once. Here is a better implementation:

public class StopThread {
  private static volatile Thread currentThread;
  
  public StopThread() {
    Thread t = new Thread(new Runnable() {
      public void run() {
        int i = 0;
        while (currentThread == Thread.currentThread()) i++;
      }
    });
    currentThread = t;
    t.start();

    TimeUnit.SECONDS.sleep(1);
    currentThread = null;
  }
}

By storing a reference to the running thread, the running thread itself can easily see whether the rest of the system also thinks it should be running. Restarting the thread becomes easy now. Just reference a new thread instance and the old thread will die.

The previous program just demonstrates the principle. A better starting point is the following program:

public class RestartableThread {
  private AtomicInteger threadNumber = new AtomicInteger(0);
  private volatile Thread currentThread;

  public RestartableThread() {
    startThread();
  }

  private void startThread() {
    currentThread = new Thread(
      new RestartableRunnable(), "Restartable thread " + threadNumber.getAndIncrement());
    currentThread.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
      public void uncaughtException(Thread t, Throwable e) {
        startThread();
        logger.error(String.format("Thread '%s' has an uncaught exception and was restarted", t.getName()), e);
      }
    });
    currentThread.start();
  }

  public void shutdown() {
    currentThread = null;
  }

  private class RestartableRunnable implements Runnable {
    public void run() {
      while (true) {
        if (currentThread != Thread.currentThread()) {
          return;
        }

        try {
          // Do a bit of work. Make sure this takes a limited time.

        } catch (Exception ex) {
          logger.error("exception", ex);
        }
      }
    }
  }
}

You can stop the thread by calling the shutdown() method. Failure to do so on program exit will prevent the JVM from stopping.
The runnable contains a try/catch to prevent the thread from dying. As a double fail safety mechanism, the uncaught exception handler is triggered for unhandled Throwables such as out of memory errors. The exception handler then simply starts a new thread so that execution continues. There are subtle advantages to this double approach. Exceptions are normally not so catastrophic, the thread could just continue. Out of memory errors however can be better dealt with by letting the thread terminate and make all its used memory available for the garbage collector.
If for any other reason you want to (re-)start the thread (for example because it is programmed to run only for a limited time), simply call startThread().

Make sure that the run condition is checked regularly. Consumer threads that wait on something like a queue should wake up regularly. For example, if you are waiting on new entries in a BlockingQueue this is better done with poll() then with get().

Conclusion
Despite the presence of the new and shiny executor services, it is still sometimes necessary to write your own thread. I hope that this little article will get you started on a robust yet flexible implementation.

Next thread article: a helper for asynchronous cache updates.

Friday, August 1, 2008

Effective Java 2nd edition - Book review

If you are even a bit serious about your java programming, this book is a must read. There is no other book that improved the quality of my java programming in such a dramatic way.

Now this was my opinion on the first edition. After having read the second edition, I can tell you that the same top quality is there, with lots and lots of big and tiny updates on top.

Ok, with that out of way, here are some comments:

In the introduction it says 'This book is not intended to be read from cover to cover'. Ignore that, read it from cover to cover anyway!
Initially I missed an overview of what changed since the 1st edition. But when I started reading I understood that such a list would be ridiculous. Every chapter has been improved in tiny, and sometimes big ways.

And now for some tiny counter noises:

Item 4: This item is about enforcing noninstantiability of classes that contain static methods only. However, except for saying that is nonsensical, it fails to provide a good reason for preventing this. I would say: don't bother. Let silly programmers instantiate the class. Firstly it can not do much wrong, secondly, any good IDE warns you about it.
Item 8: When overriding equals, one should use == instead of equals to compare fields of enum type. The item mentions that this should only be done for certain primitive fields.
Item 60: I disagree with the advice to throw NullPointerException in case an argument was null while the contract of the method specifies that this is not allowed. NPE is also thrown by the JVM and is therefore more suitable to indicate a programming error. IllegalArgumentException is in my opinion more suited to express that a method is used wrongly. In other words: a NPE indicates an error inside the method, an IAE indicates an error outside of the method.
Item 66: I recently saw a neater trick to control threads. More on this in the next post.
Item 68: One thing that still puzzles me is how one can write a consumer task with the new executor services. This item does not make this clear. Perhaps this use case still warrants the creation of your own thread.

Happy reading!

Update 2008-08-03: refined my critics on throwing a NPE vs IAE.