Thursday, December 13, 2007

What flavor of closures?

I just attended Josh Bloch's presentation at JavaPolis, where he asks the community whether they want Java to support function types, or if they'd prefer that people write these things the way they do today. His examples are carefully selected from the most twisted of the test suite. Compiler test suites are a good place to find the most twisted but unrealistic uses of any given language feature. I thought it would be interesting to look at the question in the context of a real API. You probably know my opinion, but just to be clear, here is an excerpt from Doug Lea's fork-join framework

 * An object with a function accepting pairs of objects, one of
 * type T and one of type U, returning those of type V
interface Combiner<T,U,V> {
  V combine(T t, U u);
class ParallelArray<T> {
   * Returns a ParallelArray containing results of applying
   * combine(thisElement, otherElement) for each element.
  <U,V> ParallelArray<V> combine(
    ParallelArray<U> other,
    Combiner<? super T, ? super U, ? extends V> combiner) { ... }

And the equivalent code ported to use the features of the closures spec:

class ParallelArray<T> {
   * Returns a ParallelArray containing results of applying
   * combine(thisElement, otherElement) for each element.
  <U,V> ParallelArray<V> combine(
    ParallelArray<U> other,
    { T, U => V } combiner) { ... }

The question Josh asks is this: which version of this API would you prefer see?

The point he makes is that function types enable (he says "encourage") an "exotic" style of programming - functional programming - which should be discouraged, otherwise the entire platform will become infected with unreadable code. Although functional programming is just as possible with or without function types - they are just shorthand for interface types, after all - Josh prefers the language provide syntactic vinegar for these techniques.

Part of his talk was about the problems of being able to use nonlocal return by default in a closure. See my previous blog post for a description of how this theoretical problem won't exist in the next version of the spec, and doesn't exist in the prototype today.

Finally, Josh showed that if you want to use something like eachEntry to loop over a map, and you want to be able to use primitive types for the loop variables, autoboxing doesn't work and you'd have to define 81 different versions of the eachEntry method (one for each possible primitive type in each position). That's true, just as it's true that you'd have to define 81 different versions of the Map API if you want to be able to handle primitives in them. If it turns out to be a good idea to make autoboxing work for the incoming arguments to a closure, that is a small tweak to the closure conversion. These kinds of issues can be addressed in a JSR.

Josh's vision for an alternative is Concise Instance Creation Expressions along with adding a moderate number of new statement forms.

Monday, December 03, 2007

Restricted Closures

Note: this discusses a feature of the Closures specification that was published back in February, but which is likely to change in an upcoming revision.

The Closures for Java specification, version 0.5, contains a special marker interface java.lang.RestrictedFunction. When a closure is converted to an interface that extends RestrictedFunction, this prevents the closure from doing certain operations. Specifically, it prevents accessing mutated local variables from an enclosing scope, or using a break, continue, or return to a target outside the closure. The idea is that APIs that are intended to be used in a concurrent setting would want to receive restricted rather than unrestricted closures to prevent programmers from shooting themselves in the foot.

Two weeks ago Mark Mahieu contacted me regarding his experience with the closures version of the fork-join framework. Because I had ported that API before I had implemented any of the operations that would be restricted, and before RestrictedFunction itself, I had simply not provided any restrictions at all. Mark was wondering how to do it:

I hadn't looked at the jsr166y javadoc before you linked to it on your blog, so I had the chance to compare the two versions on equal terms, and I can honestly say that I found the closures version of the API to be much more approachable at first blush. I also suspect that the majority of the Java programmers I work with would feel the same way, once comfortable with function type syntax.

One thing I did wonder was whether a method like ParallelArray.combine() could be declared as:

public <U,V,C extends {T,U=>V} & RestrictedFunction> ParallelArray<V> combine(ParallelArray<U> other, C combiner) { ... }

but my reading of the specification suggests that the type C won't be a valid target for closure conversion. Maybe I'm being greedy, but in certain cases (jsr166y being a good example) I'd ideally want both the clarity provided by using function types in place of a multitude of interfaces, and the compile-time checking afforded by RestrictedFunction. Having said that, I think the additional type parameter above negates those gains in clarity somewhat, even if it were an option.

I responded, describing what I had been planning to do in the next minor update of the spec:

I expect to make that work. However, I hope it won't be necessary. I expect to support function types like


directly. For example

public <U,V> ParallelArray<V> combine(ParallelArray<U> other, {T,U=>V}&RestrictedFunction combiner) { ... }

You will be allowed to intersect a function type with non-generic marker interfaces such as RestrictedFunction, Serializable, etc. Unfortunately, I will have to rev the spec to support this.

Since that time I've been discussing this issue with a number of people. Some, who believe that the concurrent use cases are primary, or who believe that "Mort" programmers will blithely copy-and-paste code from anonymous inner classes (which have different semantics) into closures, suggest that the default is backwards: closures and function types should be restricted unless specific action is taken to make them otherwise. Reversing the sense of the marker interface doesn't work (it violates subtype substitutability), but there may be other ways to accomplish it. On the other hand, there are others who believe the synchronous use cases, such as control APIs, are primary (even when used in a concurrent setting), and prefer not to see the language cluttered with support for the restictions at all. Instead, they would prefer that any such restrictions take the form of warnings (which the programmer might suppress or ask javac to escalate to errors). I have sympathy for both camps.

Another possibility would be to produce a warning whenever you use a nonlocal transfer at all and do away with RestrictedFunction. The way to suppress the warning would be with a @SuppressWarning("nonlocal-transfer") annotation. Could we make it an error instead of a warning? This may make the interface easier to read, but it doesn't give the API designer any way to express a preference. It may make control APIs painful to use.

Finally, it would be possible to use a different syntax for restricted and unrestricted function types and closures. For example, one using the => token would be restricted, not allowing nonlocal transfers. One using a different token such as ==> or #> would be unrestricted, allowing nonlocal transfers. The idea is that if you want an unrestricted closure, you'd have to use the slightly more awkward syntax, and the receiving type must also be of the unrestricted variety. The control invocation syntax would be defined in terms of the unrestricted form. This enables API designers to express a preference for whether or not clients would be allowed to write unrestricted closures (and therefore, whether or not they would be allowed to use the control invocation syntax).

This can be made to work using only concepts already in the spec. The unrestricted form of a function type would be defined as an interface type as in the current spec. The restricted form would be the same but with RestrictedFunction mixed in. With this approach there is no need for the explicit "&" conjunction-type syntax for function types.

Tuesday, November 20, 2007

Closures Prototype Update and Extension Methods

Closures Prototype Update

The Closures for Java prototype now allows a closure to access mutated local variables from an enclosing scope. You can download the prototype here. You can also download the sources for the rewritten parts of Doug Lea's fork-join library, ported to use function types. It is a good example of how APIs can be affected by these language changes. Personally, I find the API simplifications to be quite compelling. If you're on the fence about function types, I recommend you have a look. Any feedback you may have is most welcome!

I mentioned previously that I'm working on a number of smaller language features, which hopefully will be considered for JDK7. For now, I'd like to talk about just one of them.

Extension Methods

Once an API has been published, you can't add methods to it without breaking backward compatibility. That's because implementations of the old version of the interface don't provide an implementation of any new methods. You can use abstract classes instead of interfaces, and only add concrete methods in the future. Unfortunately, that limits you to single inheritance.

One way API designers work around this limitation is to add new functionality to an interface by writing static utility functions. For example, java.util.Collection.sort acts as an extension of java.util.List. But such methods are less convenient to use, and code written using them is more difficult to read.

Extension methods enable programmers to provide additional methods that clients of an interface can elect use as if they are members of the interface. Todd Millstain's Expanders are the most full-featured version of this feature. The simplest version of this feature that I advocate would be to enable statically-imported methods to be used as if members of their first argument type. For example:

    import static java.util.Collections.sort;
    List<String> list = ...;

Extension methods are completely orthogonal to closures, but they enable a number of typical functional-style programming patterns to be expressed more directly in Java using extension methods that accept closures.

Sunday, October 28, 2007

Java Closures: first prototype

I've finally had some time to make progress on a prototype of closures. If you want to see what an API looks like, you can compare Doug Lea's jsr166y fork-join framework to the same API ported to use the language features of the prototype.

If you want to try it, you can download an executable version of the prototype here. Make sure a JDK6 version of java and javac are on your path. This is binary-licensed under the JRL, but if a JSR is created I expect to license it under GPLv2. There are a few small test cases included.

This prototype supports

  • the BGGA function type syntax
  • closure literals
  • the closure conversion
  • the null type, disjunctive types, and exception transparency
  • definite assignment
  • Unreachable and completion transparency.
  • Catching multiple exceptions at once like catch(X1|X2 ex) { ...
    [not closely related to closures but the implementation was simple once closures are there]

This prototype does not yet support

  • a closure using a mutated variable from the enclosing scope
  • nonlocal control-flow (break, return, and continue)
  • the control invocation statement and loop abstractions

I'm intentionally distributing it before these features are available. The idea is that people can try this version, and compare it to the next version with these features working.

Separately, I'm working on a set of smaller language extensions for JDK7, some of which interact very nicely with Closures. For example, "extension methods" enable you to have the effect of adding methods to existing interfaces (e.g. adding "each", "filter", etc to Collection) without breaking backward compatibility. I'll write more about these over the next few days.

This is still rough around the edges, but any feedback you have is most welcome.

Tuesday, July 31, 2007

Internal Versus External Iterators

In the "Gang Of Four" Patterns book's discussion of the Iterator pattern, we read (page 260):

Who controls the iteration? A fundamental issue is deciding which party controls the iteration, the iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an external iterator (C++ and Java), and when the iterator controls it, the iterator is an internal iterator (Lisp and functional languages). Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element in the aggregate.

External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators. Internal iterators are especially weak in a language like C++ that does not provide anonymous functions, closures, or continuations like Smalltalk and CLOS. But on the other hand, internal iterators are easier to use, because they define the iteration logic for you.

To make this very concrete, one might define a collection-like interface using external iterators like this:

public interface ExternalIterable<T> {
    ExternalIterator<T> iterator();
public interface ExternalIterator<T> {
    T next();
    boolean hasNext();

On the other hand, using internal iterators one might define an interface something like this:

public interface InternalIterable<T> {
    void iterate(Function<T> closure);
public interface Function<T> {
    void invoke(T t);

Languages with well-integrated support for closures (such as Scala, Smalltalk, and Ruby) usually provide support for looping over their collections using internal iterators - they are, after all, easier to use in most cases - while other object-oriented languages (such as C++, Java, and C#) tend to use external iterators. Without well-integrated language support for closures, internal iterators would be too painful to use effectively. For that reason, the Java collection framework uses external iterators. But once we have closures in the language, wouldn't it be worth reversing that decision?

The answer is no, and it isn't just because it would be an incompatible change to an existing interface. As discussed above, external iterators are more flexible for some clients. The simpler code that clients can write using internal iterators is already achieved in many clients (of external iterators) due to the previous addition of the for-each loop in JDK5. For the remaining clients, simple library methods can bridge the gap between internal and external iterators. See, for example, the "eachEntry" method for iterating over the entries of a map, discussed in my earlier postings on closures. To see how easy the conversion is, here is the code to convert from an external iterator to an internal one:

    public <T> InternalIterable<T> internalize(final ExternalIterable<T> ext) {
        return new InternalIterable<T>() {
            public void iterate(Function<T> closure) {
                for (ExternalIterator<T> it = ext.iterator(); it.hasNext(); ) {

Iteration using internal iterators is often much easier to implement, because the iterator implementation doesn't have to explicitly store and manage the state of the iteration. Much of the complexity in the implementation of the iterators for Java's HashMap and TreeMap (and their Set cousins) would simply vanish if the iterators were internal. For that reason, it is interesting to see if it is possible to have the iterator implemented internally, but exposed to the client externally, by writing a utility method that converts between the two iterable interfaces. This is the reverse of the conversion above. How easy this is to implement depends on the features of your programming language.

C# provides a "yield return" construct that helps provide the convenience of implementing internal iterators and the flexibility of using external iterators. But it is not quite powerful enough to bridge the gap between them. See notes from Cyrus Najmabadi's attempt to do so. Neither are simple (local) byte-code rewriting systems such as Aviad Ben Dov's Yielder Framework for Java. You can do it using continuations, coroutines, or fibers. But Java doesn't have them.

You can solve the problem in Java by resorting to the use of a separate thread to simulate coroutines. The result is messy and expensive, as each converted external iterator requires its own thread. Here is my implementation; can you do better?

Thursday, July 05, 2007

Constructor Type Inference

One of the ideas for improving the Java Programming Language is "type inference" on variable declarations. The idea is to simplify a pattern of code that now appears in programs due to generics:

Map<String,List<Thing>> map = new HashMap<String,List<Thing>>();

surely we shouldn't have to give the same type parameters twice? The simplest proposal to relieve this redundancy allows

map := new HashMap<String,List<Thing>>();

This introduces the new colon-equals token and the declaration-assignment statement. The variable appearing on the left-hand-side of the statement is implicitly defined by this statement, and its type is the type of the expression on the right-hand-side. I don't like this proposal. It both goes too far and not far enough.

It goes too far in that it allows the programmer to elide the type in a variable declaration. The type in a variable declaration is valuable documentation that helps the reader understand the program, and this proposal reduces the readability of programs by allowing it to be elided. Worse, it assigns the wrong type to the variable. Following Effective Java (first edition, item 34), the type of a declared variable should be an interface type. This statement form forces the variable to be of the (likely more specific) type of the right-hand-side. Consequently, the programmer may inadvertently depend on features of the concrete implementation class when using the variable. That would make it more difficult to modify the program later by selecting a different implementation type.

This syntax doesn't go far enough because the verbosity of creating generic classes is worth eliminating in other contexts as well. Programmers today work around the verbosity by providing static factory methods corresponding to constructors:

static <K,V> HashMap<K,V> makeHashMap() {
    return new HashMap<K,V>();

This addresses the immediate problem:

Map<String,List<String>> map = makeHashMap();

Unfortunately, this idiom replaces one form of boilerplate (in variable initialization) with another: trivial static factories. A generic class is typically created more than once, so adding a single static factory can simplify the code at every creation site. But with language support, we can do better.

I propose a new form of class instance creation expression:

Map<String,List<Thing>> map = new HashMap<>();

Using empty type parameters on a class instance creation expression asks the language/compiler to perform type inference, selecting appropriate type parameters exactly as it would in the invocation of the equivalent trivial static factory.

Type inference today works on the right-hand-side of an assignment. I also propose that we enable this new form to be used in more situations by improving type inference for expressions appearing in other contexts:

  • the argument of a method call
  • the receiver of a method call
  • the argument of a constructor
  • the argument of an alternate constructor invocation

This would enable generic methods to be invoked in these contexts without providing explicit type parameters.

Saturday, May 26, 2007

Removing Language Features?

As a language grows by the addition of features, it necessarily gets more complex. After all, you can't remove existing language features because existing programs use those features, but each additional feature adds complexity. Right?

Fellow Googler Matt Shulman asked me a question about the Closures for Java specification. He observed that much of the complexity arises because of support for checked exceptions in the spec. Things like throws type parameters, disjunctive types, and throws clauses on function interfaces would be unnecessary without checked exceptions. Matt asked me if we had considered if things would be simpler without all that. At first I misunderstood his question to be referring to just the Closures specification, so I answered that the facility wouldn't fit into the language as well without support for checked exceptions.

Matt clarified that he was asking not just about removing support for checked exceptions from the Closures spec, but from the entire programming language.

There has been an ongoing debate on the utility of checked exceptions. Many people are critical of Java's checked exceptions, characterizing them as a failed experiment in software engineering. In practice, checked exceptions can result in API complexity, and programs appear to be cluttered with exception handling code just to satisfy the compiler. Some people believe checked exceptions are a good language feature but are misused, even in the JDK. With the "experts" being such poor role models, how can we expect ordinary Java programmers to do better?

We did a Google search to see how many people have written in support of checked exceptions and how many people don't like them. The discussion seems to be lopsided against checked exceptions, but on the other hand that may be due to the fact that checked exceptions are the status quo.

This isn't a question I had thought much about. I believe the language could be simplified by treating all exception types as unchecked without breaking existing programs. This could also result in a simplification of future language extensions and APIs. But would the language and platform be better off without checked exceptions?

Sunday, May 20, 2007

A Limitation of Super Type Tokens

Watching Josh Bloch's presentation at JavaOne about new topics in the second edition of Effective Java makes me want to go out and get my own copy. Unfortunately, he's not scheduled to have the new edition in print until later this year.

There was a coincidental adjacency between two slides in Josh's talk that made me think a bit more about the idea of Super Type Tokens. The last slide of his discussion of generics gave a complete implementation of the mind-expanding Typesafe Heterogenous Containers (THC) pattern using Super Type Tokens:

import java.lang.reflect.*;

public abstract class TypeRef<T> {
    private final Type type;
    protected TypeRef() {
        ParameterizedType superclass = (ParameterizedType)
        type = superclass.getActualTypeArguments()[0];
    @Override public boolean equals (Object o) {
        return o instanceof TypeRef &&
    @Override public int hashCode() {
        return type.hashCode();

public class Favorites2 {
    private Map<TypeRef<?>, Object> favorites =
        new HashMap< TypeRef<?> , Object>();
    public <T> void setFavorite(TypeRef<T> type, T thing) {
        favorites.put(type, thing);
    public <T> T getFavorite(TypeRef<T> type) {
        return (T) favorites.get(type);
    public static void main(String[] args) {
        Favorites2 f = new Favorites2();
        List<String> stooges = Arrays.asList(
            "Larry", "Moe", "Curly");
        f.setFavorite(new TypeRef<List<String>>(){}, stooges);
        List<String> ls = f.getFavorite(
            new TypeRef<List<String>>(){});

But on the very next slide, the very first bullet of the summary of his presentation reminds us

  • Don't ignore compiler warnings.

This was referring to Josh's advice earlier in the presentation not to ignore or suppress unchecked compiler warnings without trying to understand them. Ideally, you should only suppress these warnings when you have good reason to believe that the code is type-safe, even though you might not be able to convince the compiler of that fact.

The method Favorites2.getFavorite, above, is annotated to suppress a warning from the compiler. Without that annotation, the compiler complains about the cast to the type T, a type parameter. Is this code demonstrably type safe? Is it possible to cause this cast to fail using code that is otherwise completely type safe? Unfortunately, the cast is not safe:

class Oops {
    static Favorites2 f = new Favorites2();

    static <T> List<T> favoriteList() {
        TypeRef<List<T>> ref = new TypeRef<List<T>>(){};
        List<T> result = f.getFavorite(ref);
        if (result == null) {
            result = new ArrayList<T>();
            f.setFavorite(ref, result);
        return result;

    public static void main(String[] args) {
        List<String> ls = favoriteList();
        List<Integer> li = favoriteList();
        for (String s : ls) System.out.println(s);

This program compiles without warning, but it exposes the loopole in the type system created by the cast to T in Favorites2.getFavorite. The compiler's warning does, after all, tell us about a weakness in the type safety of the program.

The issue is a subtle one: TypeRef treats two types as the same when the underlying java.lang.reflect.Type objects are equal. A given java.lang.reflect.Type object represents a particular static type appearing in the source, but if it is a type variable it can represent a different dynamic type from one point in the program's execution to another. The program Oops exploits that mismatch.

The Super Type Token pattern can be redeemed by disallowing the use of type variables anywhere in the Type object it stores. That can be enforced at runtime (but not at compile time) in the constructor.

Perhaps a better solution would be to reify generics (i.e., "erase erasure") in the language, making all this nonsense unnecessary.

Friday, April 27, 2007

A Consensus Closures JSR Proposal

I had set aside work on the closures prototype for a couple of months to write a JSR proposal that represents a consensus among the folks thinking about this area. You can find it at One of the things I learned is that unanimous agreement is rarely possible. There are those who feel that nothing should be done to the Java programming language, and such people will not be swayed by simple but powerful additions. Our latest JSR proposal comes as close to achieving consensus as I believe possible. All but one of the authors of the three widely-discussed closures proposals have agreed to support it.

The purpose of the JSR proposal is to define the problems to be solved and circumscribe the permitted solution space. It doesn't mandate a particular solution, though it does offer the Closures for Java specification as an example of a solution to many (but not all) of the problems. This should not be surprising, as that spec was written specifically in an attempt to satisfy the requirements. Still, the spec is a work in progress.

So what is next? I hope we'll have some active discussion at JavaOne about where to go from here.

Thursday, March 29, 2007

Closures for Organizing Your Code

Much of the discussion of Closures in Java has been about they way they affect public APIs. But there is another aspect that is just as important: the way closures affect private APIs between parts of your program. Closures often enable a tremendous simplification of a program design compared to what would be required in their absence. The following describes my implementation of a graph algorithm for computing the Jeffersonians of a graph using algorithm K from Knuth's The Art of Computer Programming, volume 4B, section 7.5.7.

As you may be aware the set of Jeffersonians of a graph is best computed using a complex recursive algorithm. Although recursive algorithms can be translated into algorithms without using recursion (Java without recursion remains Turing-complete), the recursive version of the algorithm is much shorter and easier to understand. We're lucky to be living in an age in which virtually all programming languages support recursion. Though details of the implementation are not important, my implementation went something like this:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, result);
    return result;

The idea is that the recursive part of the algorithm can pass around the collection into which the result will be placed, and every Jeffersonian that is found will be placed into the collection:

private void findAllJeffersoniansInternal(
Graph g, Collection<Jeffersonian> result) {
// complex recursive algorithm here
Jeffersonian found = ...;
// more complex recursion here

One pot of coffee and an all-nighter later I had this working like a charm. The next day my tech lead asked me to add an API element that determines whether or not a graph has a Jeffersonian or not. That was easy:

public boolean hasJeffersonian(Graph g) {
    return findAllJeffersonians(g).size() != 0;

This didn't pass code review. The problem is that this new method is to be used in the inner loop of Google's über-secret application that will take over the world. Never mind that. The problem is performance. Determining whether or not a graph has a Jeffersonian can be done in linear time, but enumerating all of them requires quadratic time (or worse). But my implementation does it the hard way. By then it was Friday afternoon and I really wanted to head home for a glass of wine, so I did what any self-loathing software engineer would do: I cut and pasted the complex recursive code in findAllJeffersoniansInternal into hasJeffersonianInternal and added a boolean return value (true when a Jeffersonian was found). Then I added logic to short-circuit the rest of the algorithm once a Jeffersonian had been found at any step. The code was messy but workable, and I had it passing tests in less than an hour. The code duplication left me somewhat uncomfortable, but the two methods were different enough that merging them would have been hard. I considered adding a second flag so I could have one body of code to do both versions, but I decided to leave that refactoring until Monday.

Something very strange happened over the weekend, though. On Monday my pointy-haired boss told me there was both good news and bad news, and asked which I wanted first. Knowing how these jokes work (the second one always trumps the first) I asked for the bad news first. The bad news was that my machine had crashed, losing all of my work from Friday. Including my implementation of hasJeffersonian. The good news was that my machine had been replaced with a brand new one, a fast new 40-core workstation, and it came with JDK7 preinstalled. I had been using JDK6 before, so I was eager to try the new Java language features.

Taking a fresh look at the problem of writing hasJeffersonian, I decided to refactor the original program to pass a closure instead of a collection:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, { Jeffersonian j => result.add(j); });
    return result;

private void findAllJeffersoniansInternal(
Graph g, {Jeffersonian => void} foundJeffersonian) {
// complex recursive algorithm here
Jeffersonian found = ...;
// more complex recursion here

Then I realized I could use the nicer syntax allowed for passing a closure to a method:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(Jeffersonian j : g) {
    return result;

Solving the second problem was then trivial:

public boolean hasJeffersonian(Graph g) {
    findAllJeffersoniansInternal(Jeffersonian j : g) {
return true;
} return false; }

That was the entire implementation. I had a strange sense of elation, but I couldn't quite tell why. I could no longer remember why the problem was so messy on Friday. This refactoring seemed trivial, and this code was so clear. What made it so hard before?

Then I woke up. It's 2007, not 2009. JDK7 is barely a gleam in the eye of Sun. My machine is only dual-core. Consensus on closures is elusive. As far as I can tell, there isn't any such thing as a graph's Jeffersonian, or a Google plan to take over the world. It's Monday morning, and I have to figure out how to merge two almost-copies of a big recursive algorithm.

But on the bright side, my boss is a really nice guy.

Friday, March 16, 2007

A Compact Object Comparator

Every now and then a problem arises where the right solution would be to impose an arbitrary total ordering on a collection of objects. The simplest example of this is when you need to sychronize on more than one object, all at the same time, to maintain some consistency condition across those objects. Using Closures, you might invoke a utility method like this:

Locks.withLocks(lock1, lock2) {
    // code protected by both locks

To avoid deadlock, every piece of code that locks the same set of locks should do so in the same order. Rather than forcing all callers of the withLocks method to worry about getting them in the right order, the implementation of withLocks can sort the incoming locks. Then the caller can just pass the locks in arbitrary order, knowing that they will be locked "in the right order". It doesn't actually matter what order we sort them in, as long as we always get the same order for the same objects. The implementation of withLocks can use Collections.sort to sort the incoming locks, but java.util.concurrent.locks.Lock is not naturally comparable, so we need to pass an appropriate comparator to sort. We need a java.util.Comparator<Lock>, but a java.util.Comparator<Object> would work just as well. Let's specify, and then implement, a suitable comparator. Here is what we need:

 * Returns a comparator that imposes a complete order on all objects.
 * Each invocation of this method may yield a distinct comparator,
 * or may yield the same comparator.
public Comparator<Object> totalOrder() { ... }

How are we going to do this? One idea is to create an assignment of long values to each object, as needed. That would look something like this:

public Comparator<Object> totalOrder() { return new TotalOrder(); }
private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    public int compare(Object o1, Object o2) {
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        return nonce;

There are two major problems with this approach. First, it causes object retention. Objects whose space would otherwise be recovered by the garbage collector are retained because they are reachable as keys in the codes map. We can't fix this by simply using a WeakHashMap; without the identity semantics of IdentityHashMap the technique doesn't work. We really need WeakIdentityHashMap for this, but no such class exists in the JDK yet. Fortunately, "crazy" Bob Lee has come to the rescue with an implementation of this concept inside the recently open-sourced Guice dependency injection framework. I think this belongs in the JDK, and now is the time to propose it for JDK7.

The other problem with this implementation is that this utility takes up too much space. In general, every time you call the compare method one or two objects might be created and added to the map.

Another idea for implementing this utility is to sort the objects based on their identity hash code. Identity hash codes are well distributed, almost like random numbers. That is naturally thread-safe, and would look something like this:

private class TotalOrder implements Comparator<Object> {
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        return (i1<i2) ? -1 : (i1==i2) ? 0 : 1;

This is much more compact than the previous approach. But because identity has codes are not guaranteed to be unique, it occasionally treats two distinct objects as equal.

We can get the best of both worlds - a space-efficient comparator and a complete order - by combining the two approaches:

private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        return nonce;
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        if (i1 != i2) return (i1<i2) ? -1 : 1;
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);

By the way, if you haven't already checked it out, see "crazy" Bob Lee's Guice dependency injection framework. We use it extensively at Google. By really taking advantage of recent language features such as generics and annotations, the Guice framework is very flexible and yet much simpler than existing frameworks. Throw away your XML and write your Java code in Java!

thanks to "crazy" Bob Lee for contributing the Guice framework, and for reviewing this essay.

Thursday, March 08, 2007

On The Expressive Power of Programming Languages

There are at least three separate proposals recently put forward in the space of "Closures for Java." Among the criteria for evaluating the proposals, I'd like to discuss two: conciseness (possibly phrased as convenience) of code using the construct, and expressiveness. Conciseness is pretty obvious, and you can compare the proposals on this measure by writing snippets of code that do basically the same thing as each other, but written using existing constructs and then each of the proposed constructs. How many characters, or tokens, does it take to write the code? By the measure of conciseness, shorter is better.

Unfortunately, the authors of the various proposals don't appear to be using a common meaning for "expressiveness" or "expressive power." Consequently, we often end up talking at cross-purposes when comparing the proposals. Some people appear to treat "expressiveness" and "conciseness" as synonyms, but to me these have completely different meanings. Expressiveness is a bit harder to measure, but in some ways more important at this stage of the discussion. See Matthias Felleisen's, On the Expressive Power of Programming Languages, 3rd European Symposium on Programming, Copenhagen, Denmark, 1990,, for one attempt to formally capture the meaning of expressiveness.

In my mind, a language construct is expressive if it enables you to write (and use) an API that can't be written (and used) without the construct. In the context of the Closures for Java proposed language extension, control abstraction APIs are the kind of thing that don't seem to be supported by the competing proposals. You don't see the proposals compared side-by-side on this measure because this is something only supported by one proposal. Programmers who have become accustomed to programming with closures find them very useful for factoring out common code in ways that are not currently possible in Java. See, for example,,,, and These kinds of uses might not occur to you if you're mainly a Java programmer, because Java doesn't reward you for thinking this way. But this is another example of expressive power.

I'm not particularly attached to one syntax or another for closures. I don't mean to say that syntax isn't important. Anyone who knows the story of variance and wildcards knows how much I value a good surface syntax. Our proposal describes a particular syntax not because we believe it is the best possible syntax, but because it is hard to write a specific proposal without some syntax. Ultimately, I hope the closures issue becomes a JSR and the expert group takes its time to decide what surface syntax is best. But I believe that the expressiveness of the Closures for Java proposal is the most important reason to consider doing anything in this space at all. If it is just a matter of a slightly more concise syntax, I'm not sure it is worth the trouble.

Monday, March 05, 2007

Java Closures versus MouseListener

The Closures for Java proposal simplifies the code for many purposes where anonymous class instance creation expressions are currently used. When the anonymous class's supertype is an interface with a single abstract method, a closure can be used directly. But if the supertype is a class, like java.util.TimerTask, or has more than one method, like java.awt.event.MouseListener, then you can't use a closure directly. You can still use an anonymous inner class directly, as always, but there are ways of using closures that may be more convenient. For TimerTask, a client can be written this way

void printHelloAfterDelay(java.util.Timer timer, long delay) { timer.schedule(TimerTask.of({ => System.out.println("Hello"); }), delay); }

if we add the following utility method to TimerTask:

public TimerTask of(final Runnable block) { class ClosureTimerTask extends TimerTask {
public void run() {; }
return new ClosureTimerTask(); }

Similarly, Peter von der Ahé showed me how to use the builder pattern, along with closures, to simplify handling mouse events:

void addSomeActions(java.awt.Component foo) { foo.addMouseListener(new MouseListenerBuilder() .setMouseClicked({ MouseEvent e => System.out.println("Mouse clicked"); }) .setMouseReleased({ MouseEvent e => System.out.println("Mouse released"); }) .setMouseEntered({ MouseEvent e => System.out.println("Mouse entered " + e.getComponent()); })); }

The implementation of MouseListenerBuilder is left as an exercise to the reader.

Monday, February 05, 2007

Closures Spec Update (v0.5)

This post discusses a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at

We've just updated the Closures for Java specification, bringing it to v0.5. There are two significant changes:

  1. We've dropped the nominal version of the specification. We are no longer maintaining parallel versions of the specification (with and without function types) because the most significant concerns regarding function types were resolved in earlier revisions of the spec.
  2. We added support for user-defined looping APIs. I wrote about this in October 2006, but did not integrate that into the spec until now.

There is now a two-hour version of my Closures for Java talk on video. It is the same as the one-hour version but with questions and answers both during and after the talk.

Sunday, January 28, 2007

A Definition of Closures

There has been some confusion over our proposal to add closures to the Java Programming Language. After all, doesn't Java already have closures in the form of anonymous inner classes? What is the point of adding something to the language that it already has? To some there appears to be a lot in the proposal that has nothing to do with closures, including the control invocation syntax, null as a type, Unreachable, throws type parameters, function interface types, and "nonlocal" returns. In my Javapolis talk I tried to give an explanation for why these features are in the proposal from the practical point of view of what kinds of things would be possible that were not formerly possible. But that begs the question: why do we call it "Closures" for Java? In this blog post I'll try to show how the definition of closures relates to the features of the proposal, and identify which features of the proposal do (and which do not) result from the definition.

Before discussing the definition of closures, it helps to understand the historical context in which the term was introduced.

Lisp was created in the late 1950's by John McCarthy and others at M.I.T. One feature of the language was function-valued expressions, signified by lambda. The name "lambda" was borrowed from a mathematical formalism known as the lambda calculus. Although Lisp was not based on an effort to model that formalism, lambda plays approximately the same role in Lisp as it does in the lambda calculus: lambda is the syntax for a function-valued expression. McCarthy's intent was that Lisp should be designed to be implemented very efficiently, ideally compiled. That desire for efficiency influenced the design of the language.

Lisp used something called dynamic scoping. Logically, in a dynamically scoped language, when a variable reference is evaluated the runtime looks up the call stack until it finds a scope in which a variable of that name is defined. But as a practical matter variable references in a dynamically scoped language can be resolved in constant time simply by maintaining a value cell for each variable name; that value cell caches the variable's current definition. Dynamic scoping is easy to implement in an interpreter or compiler. Some very clever people had found ways to not only take advantage of dynamic scoping, but had developed what would now be thought of as programming patterns that depended deeply on it. But it was soon discovered that dynamic scoping suffered subtle problems, something the Lisp community called the FUNARG problem.

Now we fast-forward to the mid 1970's. On the radio you would hear1 Elton John, Emerson Lake & Palmer, Joni Mitchell, The Captain and Tennille, John Denver, Paul Simon, Paul McCartney and Wings, ABBA, David Bowie, Janis Ian, Aerosmith, Fleetwood Mac, Heart, and Queen. A number of popular Lisp dialects were in use including InterLisp, MacLisp, UCI-Lisp, Stanford Lisp 1.6, and U. Utah's Standard Lisp. All of them were dynamically scoped. It was in this context that Guy Steele and Gerald Jay Sussman developed Scheme, a very simple Lisp dialect.

One thing about Scheme was different2. Scheme was lexically scoped, like the lambda calculus and most mathematical notations, which means that a variable reference binds to the lexically enclosing definition for that name that was active at the time the enclosing lambda form was evaluated. To explain the semantics in terms of the implementation, evaluating a lambda expression was said to produce a closure. This is a function value represented as an object that contains references to the current bindings for all the variables used inside the lambda expression but defined outside it. These are called the free variables. When this closure object, or function, is applied to arguments later, the variable bindings that had been captured in the closure are used to give meaning to the free variables appearing in the code. The term closure describes more than just the abstract language construct, it also describes its implementation.

To many in the Lisp community at the time, it didn't make sense to adopt a Lisp dialect with closures. Not only would it undermine common programming techniques but it would obviously be much less efficient. For a short time these issues were debated, and Guy Steele wrote a series of papers entitled Lambda the Ultimate _____ (where _____ is Imperative, Declarative, GOTO, or Opcode) to help explain the power of lexically scoped lambda (closures). Fast forward only a few years and the debate was largely settled: lexical scoping is Right and dynamic scoping is Wrong and we've all learned our lesson. Since that time the word closure is used to mean lexically scoped anonymous function, but the connotation is that it is possible to get the semantics wrong for any number of reasons, including bugs and concerns about implementation efficiency. It also hints that we should let the language design drive the implementation, not the other way around. Virtually every programming language, whether or not it has something like lambda and anonymous function values, uses lexical rather than dynamic scoping. The basic definition of a closure, however, shows its Lisp roots:

A closure is a function that captures the bindings of free variables in its lexical context.

Around this time, Smalltalk was introduced. Smalltalk is the most pure and simple of the object-oriented languages: everything is an object. Object-oriented languages add a twist to lexical scoping. Rather than binding all names in the lexical scope, free variables appearing in methods are bound in the scope of the object that the method is a member of. In other words, names in a method are bound to members of the "current" object. The current object is accessible by the name "self". Another small but interesting detail is that you can return early from a method in Smalltalk using the syntax "^expression". We'll return (no pun intended) to the significance of this fact later.

Methods aren't the only kind of code abstraction in Smalltalk. There is also an expression form for writing a block expression, which is essentially a lambda. Early dialects had limitations on them, but most modern Smalltalks do not. They are a true analog to Scheme's lambda. Free variables in a Smalltalk block are bound in the enclosing scope, which is typically the scope of some enclosing method. The result of evaluating a block expression is a closure, and like everything else it is an object. In this case the object has a method that you use to invoke the code of the block.

Anonymous functions (closures) were not blindly introduced into Smalltalk just because it seemed like a neat idea, or because they had worked out well in another language. Rather they were integrated fully and carefully into the language. Anonymous functions can properly be integrated into even an existing language, but there is an advantage when adding them early. As Guy Steele's papers demonstrated, they are so powerful that they subsume other language features. If you add them early, you might save yourself the trouble of adding language features that can instead be added as libraries. Smalltalk provides few control constructs directly in the language. Even the conditional "if" is provided as a library method and invoked using blocks.

Two things distinguish blocks in Smalltalk from Scheme's lambda. First, the meaning of "self" within a block refers to whatever meaning it had in the enclosing context. Specifically, it doesn't refer to the closure object itself. Second, the syntax for returning from a method, "^expression", returns from the enclosing method; it doesn't return from the method representing the closure invocation. These two details are a natural consequence of the fact that, while Scheme has only one lexically scoped language construct (variable bindings), Smalltalk has three lexically scoped language constructs: name bindings (like Scheme), the referent of the return syntax, and the meaning of "self". The definition of closures above mentioned only "the bindings of free variables", but that is because the definition was written for the language Scheme, and name (variable) binding is the only lexically scoped construct in Scheme. Common Lisp also has "return" and "goto", and these too are captured lexically in a closure. In order to realize the full power of closures, described in Guy Steele's lambda papers, they must capture all lexically scoped language constructs. Generalizing the definition of closure to cover other languages would require using more language-neutral terminology: instead of "bindings of free variables" we would have something like "lexically scoped semantic language constructs." However, that obscures the origins of the term.

Fast forward more than 25 years, and we're once again listening to some of the same music we listened to in the late 1970's. We are now considering adding closures to Java, a significantly more complex language than either Scheme or Smalltalk. We're not considering them because they seem like a neat idea, or because they worked out well in other languages, or because we're bored. Rather we're considering them: because of the power and flexibility they will add to the programmer's arsenal; because of the improved readability we expect from programs that use closures instead of the existing alternatives; and because of a number of other recently proposed language extensions that will be unnecessary if closures are added. In order to get the full power of closures, they should capture all lexically scoped semantic language constructs. What are the lexically scoped language constructs in Java?

  • The meaning of variable names.
  • The meaning of method names.
  • The meaning of type names.
  • The meaning of this.
  • The meaning of names defined as statement labels.
  • The referent of an unlabelled break statement.
  • The referent of an unlabelled continue statement.
  • The set of checked exceptions declared or caught.
  • The referent of a return statement.
  • The definite assignment state of variables.
  • The definite unassignment state of variables.
  • The reachability state of the code3.

In addition, Java has one other significant difference from either Scheme or Smalltalk: Java is statically typed. That means that each expression has a type at compile-time. So if we add closures, we need to have some appropriate type for a closure. Since a closure is an anonymous function, it is natural to consider adding function types to the language. But this is not a mandate. As you can see by the two variations of our closures proposal (the nominal and the functional versions) we believe it is possible to add closures without adding function types with a limited loss of functionality (higher-order programming becomes impractical). Our proposal for closures addresses every item on this checklist. There are additional features of our proposal (the control invocation syntax and the closure conversion) that don't relate directly to the definition of closures, but which make them integrate very nicely with existing language features. And there are additional features not mentioned in the spec (such as proper tail recursion) that would be helpful to realize the full potential of closures.

What about anonymous inner classes? It turns out that they don't pass muster on any item on this checklist. Let's set aside the fact that local variables from enclosing scopes must be final to be used inside an anonymous class. The problem is that variable names are simply not resolved in the correct scope. They are resolved in the scope of the anonymous class that you're creating, not the enclosing scope. If you're creating an instance of an interface then it's probably not too much of a problem because most interfaces don't have any (constant) variable definitions. But anonymous inner classes fail every other item on this checklist as well, most of them fatally. Most alternative proposals don't actually address any of the items on this list, and so fail to provide the power of closures any more than existing language constructs.

Setting aside all the programming language theory, don't anonymous inner classes provide, in practice, all of the advantages of closures? I believe I've already shown that the answer is no. It is certainly true that for any program you can write using closures, you can write a roughly equivalent program using anonymous inner classes. That's because the Java programming language is Turing-complete. But you will probably find yourself resorting to a significant and awkward refactoring of the code that has nothing to do with the purpose of the code. In fact, you can write a roughly equivalent program using assembly language if you have the stomach for such an effort. On the other hand, true closures increase the power of a language by adding to the kinds of abstractions you can express.

Acknowledgments: my thanks to Gilad Bracha, John Rose, and Guy Steele for filling me in on and fact-checking the terminology and relevant history. Any remaining historical fantasies are my own.

1 This is Guy Steele's impression of the late 1970's music era [personal communication].

2 Scheme was the first lexically scoped Lisp, but certainly not the first lexically scoped programming language. Algol60, for example, was lexically scoped. See also Landin's The Next 700 Programming Languages.

3 Arguably part of the lexical semantics or not, reachability state is valuable to capture in practice.

Friday, January 19, 2007

Video Interview at Javapolis

Ted Neward interviewed me at Javapolis last month, and the video has just been posted:

  1. Who's your friend? (frog on my shoulder)
  2. Who are you and what do you do?
  3. What are you presenting at Javapolis?
  4. Why are Closures for Java important?
  5. What are the differences between the two Closures proposals?
  6. What kinds of problems is your proposal trying to solve?
  7. Is Java becoming too complex?
  8. Did Generics add complexity to the Java language?
  9. Can OpenJDK lead to a fragmentation of the Java platform?
  10. How does it feel to have your code now splashed out in the open?
  11. What should I learn to be able to change the Java compiler?

You can jump to any section of the interview, or just skip the whole thing.

Tuesday, January 16, 2007

Primate Parts

Recently Chris Lamb and friends wrote about their experience adding a feature to javac, a pastime slightly more popular than it used to be now that javac's sources have been opened under the GPL. And he found something strange:

"Anyway, it turns out that the javacc [sic] code is messy. Really really messy. But it’s the source of great amusement though, not only from the scary amount of no-op casts, misleading indenting and undocumented functions, but the lexical token for the ‘@‘ symbol is ‘MONKEYS_AT‘. No, we have no idea either."
I responded with a conditional promise to tell him the story:
"Actually, the indentation is consistent if you have your tabs set at 8 spaces, where God intended them.
"There’s a story behind MONKEYS_AT, and if you know it this little piece of code is a funny inside joke. But if you want me to tell you, you’ll have to take back your assertion that javac’s code is messy and tell me that it’s a work of art."
Chris quickly buckled to the pressure, responding by email:
"Yes, it is true - when set it to 8 spaces, it seems to look a bit
better. What I mean to say is, it's now a work of art and any blemishes
are my fault. :)

"Anyway, yes, my friends and I would really like to know this story
behind the naming of the token though -- we found it at about 3AM whilst
hacking on the javac code and it put us off our stride somewhat. ^_^"
Here's the formerly untold story of MONKEYS_AT.

During the development of the JDK5 language features, the Sun team had regular meetings with a team from Denmark that was designing and implementing the variance feature (since renamed wildcards, which is a longer and more interesting story).  You can find the team member names in the paper describing the work. The Danish team was led by Mads Torgersen, and we all enjoyed a number of evenings chatting over beer. During one such session, I was discussing the work I was doing to implement annotations, and I mentioned that, unlike the "#" character that seems to have many names, there don't seem to be any alternative names for the "@" character. Mads told us that in Denmark, there are a number of names for this character, including the archaic "monkey's ass", which refers to the similarity in appearance of this character to the rear end of a monkey. We all thought this was hilarious, but perhaps a bit too risque to put in corporate-developed and publicly-visible sources. But it was just too funny to leave out. Thus I came up with the pun MONKEYS_AT. To this day that little inside joke in the sources reminds me of the team and the time we spent together.
Mads, by the way, is the one on the left. ;-)
There you have it: a disinterested observer, inclined to believe otherwise, comes to appreciate the beauty and humor of javac.

Tuesday, January 09, 2007


In Smalltalk, the name of a method being invoked is interleaved with the arguments passed to the method. Consequently it is difficult to confuse the order of arguments. In Java, on the other hand, when you invoke a method that accepts three integers it is easy to get the order wrong. The compiler has no way to detect the problem, so APIs must be carefully designed with the artificial constraint that one should avoid "too many" arguments of "compatible" types. In the context of closures, Smalltalk's syntax allows "built-in" statement forms such as if-then-else to be expressed as an ordinary method call. When we were putting together the original version of the closures proposal James Gosling suggested this idea to support do-while and if-else style syntax of user-defined control abstraction methods, something that was mentioned in the further ideas section. We placed this issue on the back burner once we found a nice syntax that works for many of the control-invocation use cases, but a recently submitted comment by Stefan Schulz on my blog reminded me of this issue. His use case is that he'd like to be able to write an API that allows him to refactor this

public String toString() {
    StringBuilder sb = new StringBuilder("[");
    boolean first = true;
    for (String s : someCollection) {
     if (first) {
            first = false;
        } else {
            sb.append(", ");
    return sb.append("]").toString();

into this

public String toString() {
    StringBuilder sb = new StringBuilder("[");
    for each(String s : someCollection) {
    } inBetween {
        sb.append(", ");
    return sb.append("]").toString();

Presumably, the API method would be defined something like this:

<T> void for each(Iterable<T> it, {T=>void} body) inBetween({=>void} between) {
    boolean first = true;
    for (T t : it) {
     if (first) {
            first = false;
        } else {

A related advantage of the Smalltalk syntax is that operator overloading comes almost for free. If operator overloading is on the table for JDK7, perhaps we can kill two birds with one stone, by making the name before the first argument optional:

static BigDecimal (BigDecimal left) plus (BigDecimal right) {
    return left.add(right);
static BigDecimal (BigDecimal left) times (BigDecimal right) {
    return left.multiply(right);

This would allow you to write code like this:

static BigDecimal f(BigDecimal x, BigDecimal y, BigDecimal z) {
    return (x) plus ((y) times (z));

It's probably a small step from here to allowing arbitrary symbols as operator names and eliding some parens. I don't think anything is required in the VM, as we can encode these method names using some non-identifier character in the VM signature. For example, the above methods could be translated in the VM to methods with the names "each~~inBetween", "~plus", and "~times" (the number of tilde characters is the number of arguments before "parts" of the name in the method signature).

There are difficult syntax issues (for example, the each-inBetween example can also be parsed as two separate statements) and I'm not sure I would recommend any of this, but I wanted to share the idea.