Unicode Shell for Java: Windows PowerShell ISE

by

I finally figured out how to have my Java programs write Unicode to a Windows shell and show up looking the right way. Here’s the final result (click to enlarge):

DOS vs Powershell for Unicode

And here’s how I did it.

1. Fire Up PowerShell ISE

My new Windows 7 Professional notebook comes with an application called PowerShell ISE (make sure to run the ISE version; the unmarked one is more like DOS and has the same problems noted below). The “ISE” is for “integrated scripting environment”.

It defaults to Consolas font, which looks a lot like Lucida console.

2. Set Character Encoding to UTF-8

This’ll set the DOS character encoding to be UTF-8.

> chcp 65001
Active code page: 65001

3. Set Java System.out to UTF-8

Before writing to System.out, Java’s constant for the standard output, you’ll need to set it up to use UTF-8. It’s a one-liner.

System.setOut(new PrintStream(System.out,true,"UTF-8"));

The true value enables auto-flushing, which is a good idea for standard output.

Demo Program

Here’s a simple test program (the backslash escapes are Java literals for Unicode code points).

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;

public class Test {

    public static void main(String[] args) 
        throws UnsupportedEncodingException {

        PrintStream utf8Out
            = new PrintStream(System.out,false,"UTF-8");
        System.setOut(utf8Out);

        System.out.println("English: Hello");
        System.out.println("French: D\u00E9j\u00E0 vu");
        System.out.println("Tamil: \u0B92");
        System.out.println("Han: \u4E52");
    }

}

You can see the output in action in the screen dumps at the top of this post.

Problems with DOS

The DOS shell will let you set the character encoding and change the font to Lucida console. But it oddly duplicates characters at the end of lines if the lines contain non-ASCII code points. You can see this on the example. And it can’t handle the Tamil or Han characters.

7 Responses to “Unicode Shell for Java: Windows PowerShell ISE”

  1. lingpipe Says:

    Having said all this, the BASH shell on Linux (relatively recent Ubuntu) does a great job of handling and rendering Unicode.

    The GNU Emacs version lets you paste into it, but doesn’t do such a great job at rendering.

    Anyone have any idea how the Mac handles Unicode in its shell and text editors?

  2. XP1 Says:

    How do I get a program to work if it retrieves inputs from the user?:

    Scanner scanner = new Scanner(System.in);
    scannerInput.nextLine()

    I get an “Already running command. Please wait.” message which does not let me type and enter an input.

  3. XP1 Says:

    Sorry, meant:
    Scanner scanner = new Scanner(System.in);
    scanner.nextLine();

    • lingpipe Says:

      No idea. I’ve never used the Scanner. Have you checked to see if there’s another Java process running on the machine that may have grabbed your resource?

  4. XP1 Says:

    @lingpipe,
    CMD.EXE, NetBeans, and Eclipse all let me run and enter input, but only the Windows PowerShell ISE pauses where there is supposed to be user input and says “Already running command. Please wait.” in the status bar. Maybe Windows PowerShell ISE thinks that whatever I am typing is a PowerShell command and wants me to wait until java.exe is done executing?

    Have you tried to retrieve user input in Windows PowerShell ISE? Maybe not Scanner, but maybe there is another way which works?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s