Sunday, November 05, 2006

Reified Generics for Java

Many people are unsatisfied with the restrictions caused by the way generics are implemented in Java. Specifically, they are unhappy that generic type parameters are not reified: they are not available at runtime. Generics are implemented using erasure, in which generic type parameters are simply removed at runtime. That doesn't render generics useless, because you get typechecking at compile-time based on the generic type parameters, and also because the compiler inserts casts in the code (so that you don't have to) based on the type parameters.

Generics are implemented using erasure as a response to the design requirement that they support migration compatibility: it should be possible to add generic type parameters to existing classes without breaking source or binary compatibility with existing clients. I wrote about this two years ago. Migration compatibility is exploited widely in JDK5; all of the collection classes and interfaces use generics, yet existing code using collections continues to work. Without migration compatibility, the collection APIs could not be retrofitted use generics; we would probably have added a separate, new set of collection APIs that use generics. That was the approach used by C# when generics were introduced, but Java did not take this approach because of the huge amount of pre-existing Java code using collections.

While solving one set of problems, erasure adds a set of its own problems. For a type parameter T, you can't write a class literal T.class. You can't use instanceof to test if an object is of type T. And you can't create an array of T. But it gets worse. You also can't write class literals for generic types like List<String>.class, or test if an object is an instanceof List<String>, or create an array of List<String>. The implementation of generics using erasure also causes Java to have unchecked operations, which are operations that would normally check something at runtime but can't do so because not enough information is available. For example, a cast to the type List<String> is an unchecked cast, because the generated code checks that the object is a List but doesn't check whether it is the right kind of list.

It isn't too late to add reified generics to Java. There are two basic approaches. The first uses the language as it exists today but adds the generic type information at runtime. In the ideal world, we wait until every bit of Java code in the world has been modified to use generics safely, and then go through a transition in which we start recording type parameters at runtime by using a new javac and VM. There are two main difficulties with this approach. First, it is not compatible. It is not source compatible because you would no longer be allowed to use raw types (i.e., it is impractical to wait until all code has been generified); it is not binary compatible because new clients would invoke new kinds of constructors for generic classes that also record the type parameters, but existing classes do not have those constructors. (A hybrid approach is being investigated in which the system records type parameters only when they are available, but I haven't yet seen a workable proposal along these lines.) The second problem is more insidious: lots of existing code uses generics but uses them in an unsafe way. For example, I have seen code that creates an ArrayList<Object>, but later casts it to a List<String> where the author knows that it only contains objects of type String. I would not be surprised to find such code in the JDK. Such code must fail at runtime when generics are reified, so some existing (but working) code would be broken.

The other approach modifies the language so that the declaration of a generic parameter distinguishes reified type parameters from erased ones. It is a pure extension of the language. The existing syntax for generics would continue to designate generics by type erasure, but a newly introduced syntax would be used to declare reified type parameters. Perhaps something like this:

class NewCollection<class E> extends Collection<E> { ... }
class NewList<class E> extends NewCollection<E>, List<E> { ... }

The rules for these reified type parameters are straightforward. When using a reified generic class, the type argument has to be a reified type. You would not be allowed to create a NewCollection in its raw form. Code that uses only reified types would never get an unchecked warning from the compiler, because the compiler can always generate the correct checking code. If you have a reference of type Collection that refers to a NewCollection<String>, you would get a runtime error if you attempt to put anything other than a String into it. In this sense reified collections are like java.util.Collections.checkedCollection, only better because the element type can itself be generic and the guarantees are stronger. You can even create an array of E inside an implementation of this type. But there is a disadvantage: the strategy requires a new set of reified Collections APIs parallel to the existing erasure-based ones. This is why C# introduced new collections APIs when they introduced generics.

As the above declaration illustrates, the new reified collection interfaces could be extensions of the existing collection interfaces. Consequently, existing APIs that receive old collections can be passed new collections. For example, you could pass a NewCollection<String> to an API that is declared to receive a Collection<String>. In addition, the new reified collection classes (that is, the implementations) could be extensions of the existing collection classes. That allows the implementation of the old and new collection classes to be shared, and provides a nice migration path. Existing APIs that return old collections can be modified to return new collections. Over time, most libraries can be migrated to use the new APIs without breaking backward compatibility (though it would break unsafe clients). In this way we can have a more gradual migration than would be possible with the first approach.

I don't mean to trivialize the work involved: this requires a significant revision of the VM and language specifications and a significant effort from the teams who implement VMs, java compilers (i.e. javac and Hotspot), and the core JDK libraries. From the programmer's point of view, however, the language change is small and mostly involves removing restrictions.

Approaching the problem this way has a secondary benefit. Because the new reified collection interfaces don't exist yet, it is possible for methods to be added to them when they are introduced. Perhaps sorting methods belong in NewList? If reified generics are added at the same time as (or after) closures, a number of useful methods could be added to take advantage of closures. Filters, mappers, reducers, and visitors are just some of the ideas.

56 comments:

Anonymous said...

Maybe at the same time as new improved generics, we could have new improved arrays that fit better with generics, something like this RFE. So an array T[] could appear, at the language level, to be just syntax sugar for a List[T].

I'm no expert, but it seems to me that this would involve similar compatibility issues and VM changes as removing erasure would.

Furthermore, maybe features such as true multidimensional arrays could be added at the same time. Supposedly this could really improve Java performance in scientific computing, if that's still a goal.

Mic said...

Encounter generic problems in closure specification? Mmm... just my guess.

I'm happy if NCollection(learned from nio) interface is introduced. so no more more ungly for xxxxx(...) loops.

Anonymous said...

Was this second solution brought up during the development of generics for Java? If so, was it merely disregarded as a way to control scope in Java 5? Why wasn't this discussed, then, for Mustang? If it was not brought up several years ago, why not?

Anonymous said...

This certainly is a very interesting proposal, but I was wondering for what style of programming reification of type parameters is required? I certainly recognize that erasure is one of the most puzzling aspects of generics for beginners, but once I got familiar enough with generics and understood the design goal of compatibility (with a good book like Java Generics and Collections), I didn't miss the reification all that much anymore. Where I had previously tried to use it, it was frequently a case of bad style (e.g. instanceof-test).

I can see that reification of type parameters is required when you depend on reflection or more advanced meta-programming, but are there compelling motivating examples besides that?

Anonymous said...

One big question for me is: should this be done in Java 1.x or should a real Java 2.0 be created (and while doing that do a real clean-up of the library, i.e. remove deprecated stuff and other breaking changes)?

Fatih Coskun said...

Too late. You should have introduce reified generics in Java 5 and a different syntax for erasure generics.

I.E:
class Foo<E> //reified
class List<erasure E> //erasure

This way Java API code could use erasure generics for compatibility, and all new code would use reified generics. I assume it is too late now for changing generics this way?

adiGuba said...

Hello,


In my humble opinion, there are still a problem : it need to redefine all Generics class in order to use them with reified generics...

Why not allowing to specify if an object can be reified or not when it was invoked ?
For example :


// standard Generics
List<String> list = new ArrayList<String>();

// Double << >> means Reified Generics
List<<String>> list = new ArrayList<<String>>();


Of course reified generics will be "cast" into standard generics (for compatibility).




And for new classes, we can force to use reified generics (this will allow to use expressions like T.class, and will prohibit the use of "standard Generics") :


public class MyReifiedObject<<T>> {

  public Class<T> getType() {
    return T.class;
  }
}



PS : Sorry for my poor english

Mileta said...

I think that recent flood of RFEs for Java language indicates that we need completely new language, designed from ground up, without backward compatibility, revised libraries (remember Cloneable, Date?) etc...
Adding all these new stuff to existing Java language while preserving backward compatibility just makes it more complex than it should be.

Matthias said...

Right on, anonymous. I consider reification a big mistake. It complicates the runtime for no apparent benefit - I have to see a compelling case yet.

Instead, I prefer the opposite direction: introduce erasure on arrays (treat T[] like Object[], and stop issuing ArrayStoreExceptions). Get rid of strongly typed bytecode - reduce the JVM's type system to the primitives + Object refs. Some day you'll reach StrongTalk nirvana.

Axel said...

Reified types are prone to combinatorial explosion - your program will spend a lot of time constructing and taking apart types, and will use quite some heap space for representing nested parameterized. This is somewhat strange if you understand types as provable statements about your program; once they are proven, it should not be necessary to compute them again and again while the program is running.

Also, when types get more complex, so does the "instanceof" operator - you would have to carry the whole typechecking logic along at runtime.

Another interesting point is the representation of recursive types, e.g. "class Integer implements Comparable<Integer>" - reification would require a circular structure, which might make matching, interesting.

not quite convinced either.

Neal Gafter said...

Axel: You're quite right that a naive implementation of reification would be quite inefficient. That's not how C# does it, and neither should Java do it that way.

David Walend said...

Reified generics would be much easier to explain and more intuitive to work with.

Because we'd always be working inside the type specifier, the language could give us a new keyword. "reified" would be much clearer than "class" (and I'd expect fewer collisions with "reified" than with "assert").

Any chance we'll be able to reference reified type parameters in type specifiers? Something like http://weblogs.java.net/blog/dwalend/archive/2006/05/tilting_at_the_1.html ? It's more of a compile-time issue, but would help projects like JDigraph.

Thanks,

Dave

Chris said...

I'm with mileta, start a new Java-like language that happens to run on top of the JVM. Extend the JVM, if necessary, in compatible ways to support the new language. Obviously you want to allow your new language to import all Java classes as first class citizens.

Neal Gafter said...

Chris and Mileta: if the new language is to run on top of the JVM, support generics, and interoperate with existing Java classes, then either it implements generics by erasure or both the new language and Java are extended in this way. Given that, it isn't clear how adding a new language helps. Moreover, what is desireable to do in a new language has little impact on what ought to be done with Java.

Kevin said...

Reified generics are what everyone expects. Languages, like user interfaces, are easier to use when they do what you expect.

That said, I worry that sentiments like "what is desireable to do in a new language has little impact on what ought to be done with Java" lead us to concepts that only make sense after 5 years of Java development, rather than making sense immediately on inspection.

If we have to add keywords, let's add them and keep things readable. Fatih Coskun's suggestion is very readable, perhaps the second-best is to break everything consistently:

class Foo<E> //doesn't compile
class List<erased E>
class List<reified E>

Anonymous said...

I had always figured that rather than even trying to have the new reified "enforcement" which breaks the entire world into enforced and non-enforced that we'd just have more runtime information embedded in an instance of a Foo[T]. With enough extra information one could do new T(x), new T[], instanceof T, etc, in the implementation. One could also check what a Foo[?]'s parameterized type is at runtime, etc.

Rather than bifurcating the world into fully reified and not, it may make more sense just to have a set of "enforced" variations or wrappers to existing collections, etc, e.g. Collections.enforcedMap(Map[T,V]) that uses such runtime information to provide a wrapper for Map[T,V] that can be used by *all* code operating against Map but uses runtime checks to prevent type violations.

Sure the runtime checks are slower than compile time ones, but time could be spent heavily optimizing them. Most importantly the Java codebase would not be split into reified and non-reified sets.

Neal Gafter said...

The "checked" wrapper approach doesn't solve the problem because it can only be used for types for which you can write a type token. Those are only the reified types. That's why this requires a language change.

Calum MacLean said...

Slightly OT...

You mention the possibility of adding new methods (sort() etc.) to NewList.

Wouldn't it be better do allow "extension methods"?
So if you had a static method m1(a1, a2, a3) - probably with an appropriate annotation - then you can call this method as a1.m1(a2, a3) - i.e. write it as if you were dispatching on the first argument?
Is this being considered?

Anonymous said...

if the new language is to run on top of the JVM, support
generics, and interoperate with existing Java classes, then either it implements generics by erasure or both the new language and Java are extended in this way.


That sounds right. To be specific,
Scala is a language/implementation with all
the properties you mention, and it
implements generics by erasure.

Given that, it isn't clear how adding a new language helps.

I think Scala avoids some of the
problems you mention with erasure by having a different type system,
with definition site variance, that avoids some of the problems you mention. Some discussion of this here

http://lamp.epfl.ch/~emir/bqbase/2006/10/16/erasure.html

Count me with Chris and Mileta on
supporting a new language. Since
Scala is a working proof of concept, maybe it should just be
renamed to Java 2 or 6 or whatever.

Neal Gafter said...

Scala does not reify generic parameters, so it is not a proof of concept of addressing this problem by introducing a new language that reifies generics.

Mic said...

Even a new language using reified generic is introduced to JVM. How about existing Java library? Should this new Language with erasure generic be able to interact with this new language ?
If so, how ? erasure <===> reified
If no, why ? new language will become lack of library support and lose the main advantage of using JVM platform.

Adding a new language is good. However, it doesn't help to correct generic. We still need a good idea that help us to migrate from erasure to reified.

Mic said...

"Should this new Language with erasure generic be able to interact with this new language ?"

It should be

Should the old java language with erasure generic be able to interact with this new language.

Axel Rauschmayer said...

The last paragraph of this blog entry is my favorite part. So far, I really like how carefully Sun has extended Java---always in very useful ways. Especially with the new ideas showing up for Java 7, it is still an exciting language to follow. I used to prefer Python to Java, but with the latest Java IDEs, it is now the other way around. There are now (after the improvements currently being discussed: a new module system, super-packages, closures etc.) only a few ways in which Java is losing out to other languages such as Python, Smalltalk or CLOS:

- Multiple dispatch, first-class methods: Some design problems inherently demand multiple dispatch. How come MultiJava is not a JSR?

- Open classes: Again, MultiJava does it and I think it's useful.

- Mixins: This was one of the last core complaints about Java a friend of mine had; and he's a die-hard Dylan fan!

- Generators (not sure full-blown closures are necessary): Really useful for some iterator problems.

- A cleanly defined base library: This is where functional languages and Python excel. Things like the Apache Commons should not be necessary. Examples: Why can't I splice and join strings? Why can't I use negative indices for collections? Map, zip etc.

- Collection literals: All scripting languages have them. I think many people are so excited about scripting languages only because collection and string manipulation is very powerful there. But that's nothing that Java couldn't easily copy.

Anonymous said...

After reading your blog entry, i was wondering if couldnt we have a very very light version of reified generics by just maybe subclassing the generic types at runtime (ok technically not at runtime, but kinda like what anonymous inner classes are)? that way, the compiler can infer the type that is used when the type is instantiated, and through reflection, you can always know what type it is, etc. And then, maybe adding one or two shortcuts and allowing E.class or instanceof E would can have the best from the two worlds. Just wondering.
best regards,
takeshi

Tom Palmer said...

+1 on developing a real Java 2.0. Just give the files a new extension (".j2"?) and let javac compile both languages from one call to the command.

Tom Palmer said...

Oh, and please don't make huge interfaces with helper methods. Existing Java 5 static imports or C# 3.0 style extension methods (or some nicer variation) are much better than making huge interfaces and saying "just extend my helper abstract class".

Tom Palmer said...

One more comment, sorry. In my version of the "Java 2.0" idea, I wouldn't recommend removing deprecated items. I'm all for backwards compatibility while removing visible cruft for common cases. A new filename extension might solve this.

axel said...

I know I'm late... but Project NextGen sounds relevant (based on sound science, it seems). It allows "using generic types in type-dependent contexts, such as casts, "instanceof" tests, and "new" operations. Nevertheless, NextGen maintains full compatibility with the JVM and existing compiled binaries." Based on Java 5 generics.

Neal Gafter said...

axel: NextGen doesn't support migration compatibility.

Anonymous said...

Please god do this! Erasure totally blows.

musheno@users.sourceforge.net said...

Sounds good to me...
BTW, how are you guys planing on interfacing with legacy code? The vast majority of the software int the world today is java, pre-generics, so if you do that wont you need multiple inheritance Vector extends Collection, but now you need a Vector<<String>>?
So lets add that, and BTW, wouldent method pointers be nice? I know, we have Runnable, and reflection, but who cares, right?
One last thing, why is C++, and C# not being used by most cooperations today? I mean most of them are not running red hat (though some still are)?
Todd_Musheno@yahoo.com
PS: If you are wondering where I am getting my opinions from I have been doing this since 1995, and I currently work as a Java Architect (So, I lead Designers, who lead Developers).

Anonymous said...

Rather than having to have java.util2 or some such, can we just admit that type safety, like any other goal, is not an absolute paragon of virtue?

I'm sure there's a more limited form of reification that is possible without having to introduce 2 collection classes in place of each one that exists today.

For instance, type information could be conditionally available at runtime in cases where the object in question is created by a class compiled with -source/-target of a some later JVM version which would capture type information during creation, etc.

This information could be used for things like T.class without a warning in cases the compiler can check apart from unsafe castes (e.g. within an instance method of SomeCollection[T], for instance) and with a warning (and possible runtime error) in other instances.

Yes, there's some possible type safety holes but we have interoperable collections -- not 2 different, redundant collection packages ala .NET.

Chris Cleveland said...

I've come up with an alternative to Java Generics. Comments welcome.
http://jroller.com/page/ccleve

Neal Gafter said...

@Chris: Your proposal certainly is simple. That's because it comes without a specification of the semantics. I think you'll find that when you attempt to specify the semantics, and particularly the overriding and subtyping behavior, things aren't so simple or clean. But I'll be happy to be proven wrong. Another possible first step for be for you to provide an implementation of your idea in ksl. Either way, I look forward to seeing how you turn your idea into a language construct.

Chris said...

Would reified generics allow me to declare my class like this?

public class Foo implements Comparable<Foo>, Comparable<Bar>
{
}

paulo said...

Would refeicated generics allow us to create a return for a method that would always return the type of this? This question was asked in the java desktop forums, in relation to introducing method chaining to the swing api, without having to redefine each method in each subclass

Bruno T. said...

arithmetic in java seem very awful
to me,
object wrapper of primitive types dont have addiction, subtraction and so on...
so the programmer must surf to primitive types and object and back, autoboxing help a little,
but dont mix well with generics, for example, imagine a class that add 2 number, without generics i write:

public class fAdd1
{

public Double sum( Double a, Double b )
{
return a + b;
}

public Float sum( Float a, Float b )
{
return a + b;
}

public Integer sum( Integer a, Integer b )
{
return a + b;
}

public Long sum( Long a, Long b )
{
return a + b;
}

}


it is very redundant, using generics could be little bit simple:


public class fAdd2 < T extends Number >
{

public T sum( T a, T b )
{
return a+b;
}

}

at compile time:

.\fAdd2.java:6: operator + cannot be applied to T,T
return a+b;


seem that type erasure and autoboxing, dont mix.

gnz said...

I'm anxiously waiting for the day when Java gets reified generics!

Reified generics are better for a million of reasons; two priorities for me are:

1. Behave as expected (principle of least surprise).
2. Performance! Amazing that very few people mentions this one on erasure-vs-reified debates; e.g. reified generics a la NET can be heavily optimized for primitives, removing boxing overhead.

Regarding a new syntax for reified, I would prefer something uncluttered like:

public class List<T'> {
}

But I really don't care about the syntax all that much as long as the feature is available!

Anonymous said...

"Reified types are prone to combinatorial explosion - your program will spend a lot of time constructing and taking apart types, and will use quite some heap space for representing nested parameterized."

People have decades of experience with reified generics, and this simply isn't true. In fact, operations on reified generics are no different from operations on generics with erasure: method calls, member access, etc. all still behave the same way.

The only thing that differs is the availability of runtime information and better static type checking.

Cow_woC said...

Hey Neal,

I posted an alternate Generics proposal on http://java.dzone.com/news/java-5-sucks-according-ibatis- which I would appreciate your feedback on.

Essentially, code compiled against non-Generic List would get interpreted as List<Object> under JDK 1.5. Then if this code tries to invoke Generic code you cast from List<Object> to List<T> at runtime as necessary.

It is my understanding that this approach would provide full migration support. You gain Reification but you lose performance when running JDK 1.4 code on newer JREs. Still, I believe this is a good trade-off and as code is upgraded to use Generics the performance penalty goes away.

http://java.dzone.com/news/java-5-sucks-according-ibatis- explains what I mean in more detail, including some sample code. Please let me know if I missed anything obvious.

Thank you,
Gili

Neal Gafter said...

Dozens of language "proposals" are now generated weekly by people incompetent to the task. Sometimes the author is quietly tolerated by the community until their attention to the issue wanes. Sometimes a more persistent author attempts to pursue their proposal, and eventually comes to realize that they are out of their depth. Occasionally an individual with a complete lack of talent never comes to appreciate the futility of their approach, and attempts to recruit support from a wider community of amateurs.

If you have a language proposal, you will need to consider far more than syntax. See Joe Darcy's excellent introduction at http://blogs.sun.com/darcy/entry/so_you_want_to_change.

For more ambitious changes that affect the type system, you should create a working prototype of your proposal to help refine your ideas. The attempt will at least help you learn if your ideas hang together. If they do, you'll want to formally specify the system (or find an interested academic who can help you do so) in a technical paper and submit it to a peer-reviewed journal or conference such as OOPSLA or TOPLAS. That's an excellent way to get some attention for a possible language change.

John P said...

Oh that was so beautifully put! Thanks for a chuckle.

Anonymous said...

how would this new feature affect reflection?
At the moment I can go around generics type safe compile time check using reflection, allowing xewample something like:

Vector< String > vect = new Vector< String >();
Method mtd = vect.getClass().getMethod("add",Object.class);
mtd.invoke(vect, new Date());

Free Programmer said...

Why does not just use some annotation for classes with reified generics?

@Generic(ERASED|REIFIED)

why introduce new language element?

Neal Gafter said...

@Free: annotations are not supposed to change the behavior or language primitives. That's the role of modifiers.

Free Programmer said...

But actually they are.

@Target & @Retetion annotations affect the generated code.

@SuppressWarnings("unchecked") directly connected to the generics issues discussed here.

@Override forces compiler to check inheritance.

Why not another one? Especially concerning that this could be a transitive feature for the language.

Neal Gafter said...

@Free: None of those annotations has any effect on the bytecode instructions generated for Java statements or expressions.

Free Programmer said...

@Retention controls whether annotation will be included into classfile/available at runtime or not - this is actually type-related information. Isn't it something similar to what is discussed here - availability of some type information at runtime?

@Target controls applicability of annotation - is there any significant difference between compiler type error on method application to incorrectly typed parameters and annotation application to incorrect "type" of annotation target?

Neal Gafter said...

@Free: @Target only affects the legality of annotations. It does not affect the generated bytecodes of statements or expressions. Similarly, @Retention controls what information is preserved statically, but that doesn't affect the dynamic semantics of Java code. Reification would require a modification of the dynamic semantics of Java code and a corresponding change to the generated bytecode, as the currently generated bytecode does not provide enough information for type parameters to be preserved.

Gili said...

@Neal, changing the bytecode doesn't seem like a big deal. Every single release since Java5 has contained a major bytecode change.

Neal Gafter said...

@Gili: adding new bytecodes to the JVM specification is not what I'm talking about. I'm talking about changing the behavior of existing Java code.

rgomes1997 said...

Hi,

The following article shows how type parameters can be retrieved at runtime. There are some use cases and pointers to utility classes.

Using Type Tokens to retrive generic parameters

Thanks

Richard Gomes

Neal Gafter said...

Richard: thank you for the reference to your unattributed description of my ideas, also documented at http://gafter.blogspot.com/2006/12/super-type-tokens.html and http://gafter.blogspot.com/2007/05/limitation-of-super-type-tokens.html

Anonymous said...

I tried that sneaky trick you posted, and it doesn't work Neal:

inconvertible types
found : java.util.ArrayList<java.lang.Object>
required: java.util.List<java.lang.String>

So I think it should be relatively safe to assume that the JDK is free of such code.

So far I haven't seen a good use case for reified generics. It seems to me like just adding bloat and I certainly hope such things are not added to Java anytime soon. If it is, I hope it is added in such a way that it will only record and run-time use extra type information in situations that can't be checked compile-time, and that the compiler can be made to issue a warning in such cases.

Neal Gafter said...

@Anonymous: try casting to an intermediate "Object" type.

Mohamed said...

http://www.scribd.com/doc/163398138/Java-Reflection-with-Generics-Practical-Case