We’ve decided to refactor LingPipe for generics for the 3.0 release. After three solid days of work (and untold nights and mornings reading about generics), I’ve finished the refactor of com.aliasi.util
; the Iterators
class and SmallSet
were rather challenging for a rookie like me.
My major complaint about Java when I first saw it was the lack of parametric typing. Many, if not most, of my programming errors are from assigning something to the wrong argument. I’m hopeless in Perl, for instance. As always, be careful what you wish for. Little prepared me for the complexity of Java 1.5 generics.
My hat’s off to the architects and developers of this amazing retro-fit. The addition of generics to a non-generic language is a heroic feat of backward compatibility engineering. What I should’ve realized before I started, but only dawned on me about 10 classes in is that just adding types doesn’t break backward compatibility anywhere. That’s why it’s possible to run 1.4-style programs in a 1.5 or 1.6 compiler or JVM. Very very cool.
Luckily, using generic classes is a whole lot easier than writing them. Even writing them is easy if you’re only worried about type safety and saving some work, and not on making something general and extensible.
My main gripe has to do with arrays. As in pre-generics Java, they’re the poor cousins of types. There’s no way to create arrays of generic types without having a handle on the java.lang.Class
object for the type being created. For instance, there are two toArray()
methods in java.util.Collection
. Suppose we have a Collection<E>
. The first array creation method has the signature Object[] toArray()
, not E[] toArray()
as one might expect, because it’s simply not possible to define the latter. The second array creation method has the signature <T> T[] toArray(T[] a)
, which is problematic in two ways. First, you need to pass in an array to fill. Second, you can still generate type errors if you do something like this:
import java.util.*; public class Test { public static void main(String[] args) { List<String> xs = new ArrayList<String>(); xs.add("foo"); Integer[] result = xs.<Integer>toArray(new Integer[10]); } }
If you omit the second line (in bold), the program compiles and runs with no complaint. If you add the second line, the program compiles, but throws a runtime exception during the underlying call to System.arrayCopy()
.
In LingPipe, I often used arrays exactly because they were type safe compared to returning collections. They also tend to be smaller than collections in memory, as there’s no overhead for wrappers, buckets, tree balancing counts, etc. Now I’ve painted myself into a corner with generics.
I’ve had to replace some return types that would’ve been E[]
for a type variable E
, with List<E>
, because it’s the only way I could ensure type safety without requiring a class object (or array) be passed in. Luckily, none of that stuff’s in deep loops. That’s all arrays of primitive types, which nobody ever expected to play nicely with generics or anything else.
As always, my advice to someone wanting to learn Java types is to study the java.util
collections framework. That, and the Kernighan and Ritchie of Java:
Ken Arnold, James Gosling and David Holmes. 2006. The Java Programming Language, Fourth Edition. Addison-Wesley.
I had to read Chapter 11 (Generic Types) many many many many times. Check out Section 16.10.1 (Genericity and Dynamic Arrays) for a deep explanation of what’s going on in the above example.
Here are some links to complement the book:
- Gilad Bracha’s Generics Tutorial. The Sun-recommended tutorial.
- Sun’s Java Tutorial:Generics Trail
- Angelika Langer’s Generics FAQ
- Ken Arnold (author of above book) wrote a polemic blog entry Generics Considered Harmful in which he ‘fesses up that "writing generified classes is rocket science" *
*Note that Arnold actually got his types wrong and his example should be T[] toArray(Class<T>)
; makes me think he didn’t read his own book!
Update: April 11, 2007
It occurred to me yesterday that java.util.ArrayList
must allocate generic arrays, because it implements List<Egt;
based on arrays. Checking out Java’s source shows you how to “cheat” the type system:
public class ArrayList<E> ... private transient E[] elementData; public ArrayList(int initialCapacity) { ... this.elementData = (E[])new Object[initialCapacity]; }
It’s easy to forget that types are simply erased in the implementation. This kind of operation is safe if it’s under your control, as it is in java.util.ArrayList
and many of the LingPipe internals. It’ll still cause the compiler to gripe. I think I can live with that.
Update April 26, 2007: I’m done. We’re about to release 3.0. But there was one final bug that Breck caught compiling on 1.6. Turns out that it’s a known issue with Java 1.6 throwing an assertion error from the compiler. It’s already in Sun’s DB of bugs, as bug 6500207. It arises with conjunctions of types in generic restriction. The bug report has a simple example; it arose for us in a class that wasn’t going out with 3.0 anyway, an EM/soft clusterer with Dirichlet priors, so hopefully it’ll be fixed soon.
Leave a Reply