Archive for the ‘Python’ Category

pyhi: Python Package/Module Hello World with all the Trimmings

January 7, 2011

I’ve been teaching myself Python, and being the compulsive neat freak that I am, I first had to figure out their namespaces and how to package everything properly.

pyhi: a Demo Python Package with all the Trimmings

If you want a skeletal application that does everything the right way (as far as I can tell from their online style recommendations), including package/module namespaces, unit tests, installer, documentation, and packages scripts, check out:

Of course, I’m happy to get advice if there are better ways to do this.

Why Python?

I’m working with Andrey Rzhetsky and James Evans at University of Chicago on a version of the Bayesian (and EM) annotation models in Python. I’m also working with Matt Hoffman, Andrew Gelman and Michael Malecki on Python at Columbia for Bayesian inference. Watch this space (and Andrew’s blog) for news.

Does Python Rock?

The short story is that I learned enough in a week to already use it for scripting and munging instead of Java. Compared to Perl, it’s a positively genius work of minimalism and consistency. Everything works pretty much the way you’d expect. When you need C/Fortran back ends (all the optimization, distribution, and matrix libs), Python’s a relatively clean front end. Numpy and PyMC are nice; the design of PyMC is particularly well thought out.

I love the generators and named arguments/defaults. I hate the whitespace syntax (no moving blocks of code with emacs to auto-indent). I wish I had a bit more control over types and pre-allocation, but that’s easily solved with utility functions.

At least as of version 2.6, character strings are the usual mess (Java’s became a mess when Unicode surpassed 16-bit code points), with one type for bytes and a different one for unicode strings (sort of like Java, only there are no built-in Java types for byte-sequence literals).

The lack of backward compatibility among versions of Python itself reminds me how fantastic the Java releases have been in that regard. Particularly the heroic effort of retro-fitting generics.

I find the lack of proper for(;;) loops or ++ operators rather perplexing; I get that they want everything to be a first class object but loop ranges seem to be taking this a bit far. And the “friendly” syntax for ternary operators is an oddly verbose and syntactically contorted choice for Python (“a if cond else b”). At least they left in break/continue.

The idea to execute a file on import probably makes sense for an interrpeted language, but boy is it ever slow (seconds to import numpy and pymc). It does let you wrap the imports in try/catch blocks, which strikes me as odd, but then I’m used to Java’s more declarative, configurable, and just-in-time import mechanism.

Why doesn’t assignment return its value? I can’t write the usual C-style idiomatic I/O loops. There are so many opportunities for function chaining that aren’t used. It must be some kind of stylistic battle where the Pythonistas love long skinny programs more than short fat ones.

Having to install C and Fortran-based packages takes me straight back to 1980s Unix build hell and makes me appreciate the lovely distribution mechanism that are Java jar files. I found the Enthought distribution helpful (it’s free for academics but pay-per-year for industrialists), because it includes numpy and then the PyMC installer worked (on Windows 32-bit; couldn’t get 64-bit anything working due to GCC conflicts I didn’t have the patience to sort out).

Of course, Python’s a dog in terms of speed and memory usage compared to Java, much less to C, but at least it’s an order of magnitude faster than R.