Serializing Immutables and Singletons with a Serialization Proxy

by

[Update: 3 January 2011. Blog commenter Konrad correctly points out a serious error in the original version of this post. It turns out you can serialize immutable objects because there's no requirement that there be a public no-argument constructor. Konrad's example code in the comment provides an example.]

A question recently came up on our mailing list/newsgroup that asked how to implement java.io.Serializable for classes containing non-serializable singletons. I decided to kick up the abstraction level of the answer and discuss LingPipe’s serialization proxy approach to serializing everything, including singletons and immutables.

Josh Bloch discusses the serialization proxy pattern in the very last section of the second edition of Effective Java. Java lets you take complete control over serialization, and I strongly advise you to use this control for forward compatibility, working around non-serializable (but constructable) member objects like singletons, and minimizing the size of serialized objects.

A Simple Immutable Class

Suppose we have a simple immutable class we want to serialize:

public class Vector2D {
   final double mX;
    final double mY;
    public Point2D(int x, int y) { mX = x; mY = y; }
    public double x() { return mX; }
    public double y() { return mY; }
    public double length() { 
        return Math.sqrt(x()*x() + y()*y()); 
    }
}

This class is immutable. The reason to prefer immutable classes is discussed at the beginning of Bloch’s book; the basic idea is that immutables guarantee thread safety (assuming the immutable member variables are thread safe for reads) and consistent object state (guaranteed by the constructor). The only constructor takes two arguments, and sets the final variables for the x and y coordinates of the vector. For illustration, there’s only a single non-trivial method computing vector length, but the same basic argument applies for any immutable class. A more robust implementation might require values to be finite (that is, not infinite and not not-a-number).

If we simply declare Vector2D to extend Serializable, we have a problem. It’ll write OK. [Update: The following is wrong and has thus been redacted.] But trying to read it back in causes a problem when reflection tries to invoke a nullary (no argument) constructor Vector2D(), which doesn’t exist. [Update: See my reply to Konrad's comment to see where my thinking went wrong and why you still need custom serialization to deal with immutables that have consistency conditions on their members enforced through the constructor.]

Serialization Proxy

Before serializing an object, Java checks to see if the class implements the method Object writeReplace(). If so, when an instance of the class is serialized, the value returned by writeReplace() is serialized instead of the object itself.

When an instance of a class is being deserialized, Java checks to see if it implements Object readResolve(). If it does, after the usual steps of serialization, the return value of readResolve() is returned as the result of serialization.

The serialization proxy typically employs a static, private nested class (called the “serialization proxy”) that is serialized in place of the class through writeReplace(). The proxy itself stores all the information needed to reconstitute the class being serialized. The method readResolve() is then used to return an instance of the original, immutable class. It sounds like a mouthful, but is really quite simple:

public class Vector2D implements Serializable {

    private final mX;   
    private final mY;
    public Vector2D(double x, double y) { mX = x; mY=y; }
    ...
    private Object writeReplace() {
        return new SerializationProxy(this);
    }

    private static class SerializationProxy 
        implements Serializable {

        int mX; int mY;
        public SerializationProxy() { }
        public SerializationProxy(Vector2D vector) {
            mX = vector.x();  mY = vector.y();
        }
        Object readResolve() {
            return new Vector2D(mX,mY);
        }        
    }
}

That’s it. What’s really nice is that it doesn’t affect the original class (Vector2D) interface. The writeReplace() method and nested static serialization proxy class Vector2D.SerializationProxy are both private.

Serializing Singletons

Suppose we have a singleton and we want to make it serializable. Continuing our earlier example, let’s define a singleton:

public class Origin2D extends Vector2D {
    private Origin2D() { super(0.0,0.0); }
    public static Vector2D ORIGIN 
        = new Origin2D();
}

(Yes, I know that this isn’t a good example of a singleton because you can construct the same vector directly, but it’ll suffice for this example.)

Because Origin2D is a singleton, we don’t have a public nullary constructor. But even if we did, we’d have the problem that the deserialized instance would be a new instance, thus defeating the singleton pattern. Here all we need to do is to define a readResolve() method to return the singleton:

public class Origin2D extends Vector2D {
    private Origin2D() { super(0.0,0.0); }
    public static Vector2D ORIGIN 
        = new Origin2D();
    private Object readResolve() { return ORIGIN; }
}

Complete Control with Externalizable

LingPipe takes even more control by defining the serialization proxies to implement the interface java.io.Externalizable, which extends Serializable. It defines two methods, writeExternal(ObjectOutput) and readExternal(ObjectInput). If an object implements Externalizable, these methods are called instead of the default serialization reflection-based methods which simply try to serialize each of the member objects in turn (simply writing primitive objects directly using DataOutput and DataInput). If a class implements Externalizable, only the fully qualified class name and serial version ID is written automatically; the class itself is responsible for all other serialization and deserialization.

LingPipe’s AbstractExternalizable

LingPipe provides an abstract base class, com.aliasi.util.AbstractExternalizable, which may be used as a serialization proxy. It has two abstract methods, Externalizable‘s writeExternal() and Object readObject(ObjectInput), whose return value is used in the concrete implementation of readResolve(). It also has some static utility methods to read and write objects to streams.

Serial Version ID

I didn’t add serial version IDs to the example classes to keep the explanation of the proxies simple. You should add these to all serializable classes so that if the class changes, it won’t break serialiazation. That is, it preservers forward compatibility. This’ll typically be a declaration of the form:

public class Vector2D implements Serializable {
    static final long serialVersionUID = -123456789L;
    ...
}

Every time Java serializes an object, it writes the objects serial version ID. If one is not defined in a static variable named serialVersionUID, one is computed through reflection. So defining one also speeds up the serialization/deserialization process. If you default to the reflection-based version, if the class changes, so will the ID, and you’ll get conflicts in trying to read in objects. If you define the ID yourself from the beginning, the value doesn’t matter; if you have a class released into the wild, you should use Java’s serialver utility (which is distriubuted with Sun’s JDK) to compute the value created by reflection to insure backward compatibility.

Compilable versus Serializable

LingPipe also provides the util.Compilable interface, which defines a method void compileTo(ObjectOutput). We use this method to compile objects like language-model classifiers, whose compiled form are very different from their regular form. Some of these classes, like TradNaiveBayesClassifier, also implement Serializable, which writes and reads back in an object with the same behavior as the one serialized. The deserialized form allows further training; the compiled form is more speed and memory efficient.

6 Responses to “Serializing Immutables and Singletons with a Serialization Proxy”

  1. João Says:

    Great explanation, thanks. I definitely must get a copy of Bloch’s Effective Java.

  2. Mahendra Says:

    Very good article, easy to understand.

  3. Konrad Says:

    I don’t understand this, my jvm can deserialize Immutables just fine without any Proxys. Maybe this was changed in Java 1.6? The following code works fine for me:

    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.ObjectInputStream;
    import java.io.ObjectOutputStream;
    import java.io.Serializable;

    public class Main
    {
    private static class Point2D implements Serializable
    {
    final double mX;
    final double mY;
    public Point2D(int x, int y) { mX = x; mY = y; }
    public double x() { return mX; }
    public double y() { return mY; }
    public double length() {
    return Math.sqrt(x()*x() + y()*y());
    }
    }

    public static void main(String[] args) throws FileNotFoundException, IOException, ClassNotFoundException
    {
    Point2D p = new Point2D(100,200);
    ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(“test.ser”));
    out.writeObject(p);
    out.close();

    ObjectInputStream in = new ObjectInputStream(new FileInputStream(“test.ser”));
    Point2D q = (Point2D)in.readObject();
    in.close();

    System.out.println(q.x());
    System.out.println(q.y());

    }
    }

    • Bob Carpenter Says:

      You’re right. That definitely works. Thanks for the really clear counterexample. I’ll flag the main body content accordingly.

      Let me make a couple of additional points.

      1. You still need custom deserialization for singletons so that you always get the same object back.

      2. As Josh Bloch explains in Effective Java, you still need to customize serialization for immutable objects as I described above if they have consistency conditions on their members. The problem is that you can serialize an instance, tweak the bytes representing numbers, then deserialize. Because default deserialization bypasses the constructor (easily seen by putting a print in the constructor itself), it bypasses any consistency checks in the constructor.

      I think this second point is why I was confused about the base case. Thanks again for pointing out my error.

      above if you want to defend against inconsistent instances of immutable classes. The problem is that someone can define a sequence

  4. Ashok Says:

    Nice post. Thanks

  5. Jazoon 2012: Serialization: Tips, Traps, and Techniques « Dark Views Says:

    [...] Serializable Proxy Pattern solves many of the [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 822 other followers