I finally figured out how to have my Java programs write Unicode to a Windows shell and show up looking the right way. Here’s the final result (click to enlarge):
And here’s how I did it.
1. Fire Up PowerShell ISE
My new Windows 7 Professional notebook comes with an application called PowerShell ISE (make sure to run the ISE version; the unmarked one is more like DOS and has the same problems noted below). The “ISE” is for “integrated scripting environment”.
It defaults to Consolas font, which looks a lot like Lucida console.
2. Set Character Encoding to UTF-8
This’ll set the DOS character encoding to be UTF-8.
> chcp 65001 Active code page: 65001
3. Set Java System.out to UTF-8
Before writing to System.out
, Java’s constant for the standard output, you’ll need to set it up to use UTF-8. It’s a one-liner.
System.setOut(new PrintStream(System.out,true,"UTF-8"));
The true
value enables auto-flushing, which is a good idea for standard output.
Demo Program
Here’s a simple test program (the backslash escapes are Java literals for Unicode code points).
import java.io.PrintStream; import java.io.UnsupportedEncodingException; public class Test { public static void main(String[] args) throws UnsupportedEncodingException { PrintStream utf8Out = new PrintStream(System.out,false,"UTF-8"); System.setOut(utf8Out); System.out.println("English: Hello"); System.out.println("French: D\u00E9j\u00E0 vu"); System.out.println("Tamil: \u0B92"); System.out.println("Han: \u4E52"); } }
You can see the output in action in the screen dumps at the top of this post.
Problems with DOS
The DOS shell will let you set the character encoding and change the font to Lucida console. But it oddly duplicates characters at the end of lines if the lines contain non-ASCII code points. You can see this on the example. And it can’t handle the Tamil or Han characters.
July 14, 2010 at 2:49 pm |
Having said all this, the BASH shell on Linux (relatively recent Ubuntu) does a great job of handling and rendering Unicode.
The GNU Emacs version lets you paste into it, but doesn’t do such a great job at rendering.
Anyone have any idea how the Mac handles Unicode in its shell and text editors?
October 3, 2010 at 11:48 pm |
How do I get a program to work if it retrieves inputs from the user?:
Scanner scanner = new Scanner(System.in);
scannerInput.nextLine()
I get an “Already running command. Please wait.” message which does not let me type and enter an input.
October 3, 2010 at 11:49 pm |
Sorry, meant:
Scanner scanner = new Scanner(System.in);
scanner.nextLine();
October 4, 2010 at 2:58 pm |
No idea. I’ve never used the Scanner. Have you checked to see if there’s another Java process running on the machine that may have grabbed your resource?
October 4, 2010 at 6:09 pm |
@lingpipe,
CMD.EXE, NetBeans, and Eclipse all let me run and enter input, but only the Windows PowerShell ISE pauses where there is supposed to be user input and says “Already running command. Please wait.” in the status bar. Maybe Windows PowerShell ISE thinks that whatever I am typing is a PowerShell command and wants me to wait until java.exe is done executing?
Have you tried to retrieve user input in Windows PowerShell ISE? Maybe not Scanner, but maybe there is another way which works?
October 4, 2010 at 6:43 pm |
The web knows this stuff way better than me. I searched [powershell ise input] and found the answer in the snippets. Here’s a longer story:
http://blogs.msdn.com/b/powershell/archive/2009/02/04/console-application-non-support-in-the-ise.aspx
October 7, 2010 at 4:49 pm
Looks like user input in Windows PowerShell ISE is unsupported.
All “start java …” does is pop up a regular CMD window.
:(