pyhi: Python Package/Module Hello World with all the Trimmings

by

I’ve been teaching myself Python, and being the compulsive neat freak that I am, I first had to figure out their namespaces and how to package everything properly.

pyhi: a Demo Python Package with all the Trimmings

If you want a skeletal application that does everything the right way (as far as I can tell from their online style recommendations), including package/module namespaces, unit tests, installer, documentation, and packages scripts, check out:

Of course, I’m happy to get advice if there are better ways to do this.

Why Python?

I’m working with Andrey Rzhetsky and James Evans at University of Chicago on a version of the Bayesian (and EM) annotation models in Python. I’m also working with Matt Hoffman, Andrew Gelman and Michael Malecki on Python at Columbia for Bayesian inference. Watch this space (and Andrew’s blog) for news.

Does Python Rock?

The short story is that I learned enough in a week to already use it for scripting and munging instead of Java. Compared to Perl, it’s a positively genius work of minimalism and consistency. Everything works pretty much the way you’d expect. When you need C/Fortran back ends (all the optimization, distribution, and matrix libs), Python’s a relatively clean front end. Numpy and PyMC are nice; the design of PyMC is particularly well thought out.

I love the generators and named arguments/defaults. I hate the whitespace syntax (no moving blocks of code with emacs to auto-indent). I wish I had a bit more control over types and pre-allocation, but that’s easily solved with utility functions.

At least as of version 2.6, character strings are the usual mess (Java’s became a mess when Unicode surpassed 16-bit code points), with one type for bytes and a different one for unicode strings (sort of like Java, only there are no built-in Java types for byte-sequence literals).

The lack of backward compatibility among versions of Python itself reminds me how fantastic the Java releases have been in that regard. Particularly the heroic effort of retro-fitting generics.

I find the lack of proper for(;;) loops or ++ operators rather perplexing; I get that they want everything to be a first class object but loop ranges seem to be taking this a bit far. And the “friendly” syntax for ternary operators is an oddly verbose and syntactically contorted choice for Python (“a if cond else b”). At least they left in break/continue.

The idea to execute a file on import probably makes sense for an interrpeted language, but boy is it ever slow (seconds to import numpy and pymc). It does let you wrap the imports in try/catch blocks, which strikes me as odd, but then I’m used to Java’s more declarative, configurable, and just-in-time import mechanism.

Why doesn’t assignment return its value? I can’t write the usual C-style idiomatic I/O loops. There are so many opportunities for function chaining that aren’t used. It must be some kind of stylistic battle where the Pythonistas love long skinny programs more than short fat ones.

Having to install C and Fortran-based packages takes me straight back to 1980s Unix build hell and makes me appreciate the lovely distribution mechanism that are Java jar files. I found the Enthought distribution helpful (it’s free for academics but pay-per-year for industrialists), because it includes numpy and then the PyMC installer worked (on Windows 32-bit; couldn’t get 64-bit anything working due to GCC conflicts I didn’t have the patience to sort out).

Of course, Python’s a dog in terms of speed and memory usage compared to Java, much less to C, but at least it’s an order of magnitude faster than R.

7 Responses to “pyhi: Python Package/Module Hello World with all the Trimmings”

  1. Yoav Says:

    For speed and leaner memory usage (as well as more control over types), you probably want to take a look at cython.

  2. Jared Murray Says:

    Second to the above note on Cython. I’ve been using it routinely for MCMC and it’s a real lifesaver. If you’re using EPD then you’ve already got it.

  3. J. Voigt Says:

    As for packaging/distribution, may be worth looking at setuptools or (better) pip.

    http://pypi.python.org/pypi/pip

    PEP 376 proposes interoperability standards for the various distribution systems going forward.

    http://www.python.org/dev/peps/pep-0376/

  4. Bob Carpenter Says:

    I think we’ll probably be sticking to C for speed and portability (in the sense of being able to write wrappers in R, for instance, as well as Python).

  5. Martin Says:

    Generators (especially when using yield statements): Awesome, right? Compare something as simple as lazily enumerating all strings of a given length using Python generators with the comparable Java code for Iterables/Iterators.

    Strings: These are done better in Python 3. Meanwhile, use unicode strings wherever you can, use a consistent source encoding for all your .py files (and put an encoding comment on the second line of each file) if they contain non-ASCII characters, and use the codecs module from the standard library for all I/O.

    Backwards compatibility: Python has been fairly consistent within each major version, but they’re not afraid of breaking things in incompatible ways between major versions. You could argue that Java has accumulated a lot of warts and cruft that they’ve decided not to throw overboard; as a result, people still use ancient collection classes in Java today. You can easily read “Java Puzzlers” as an indictment of many design mistakes in Java. The Python folks realized that some warts could be excised and decided to do so in Python 3, which of course breaks backwards compatibility.

    For-loops in Python: How are they different from for-loops in R? ;-)

    Expressions with side-effects and statements as expressions: These are often the source of subtle bugs in C. E.g. evaluation order is sometimes undefined in C (unlike Java), so subexpressions with side-effects are problematic. Assignment-as-expression allows you to mistakenly write “if (x = 1)” instead of “if (x == 1)”; though there are now compiler warnings for this. OTOH it’s a newbie mistake in Python to write ++x or –x (which are perfectly legal) and expect them to do anything other than evaluate to x.

    Installing packages: This is usually quite easy on Ubuntu (and presumably plain Debian too). Just “sudo apt-get install python-numpy”, for example.

    Incidentally, my biggest pet peeve about Python, especially for numerical code, is the lack of floating-point control. E.g. floating point division-by-zero is always trapped and causes a Python exception, instead of returning infinity. Sometimes you want the former, but more often the latter. The lack of control over this is annoying, as is the lack of support for many of the floating point operations and constants in the C99 standard math library. This makes it harder than it should be to prototype numerical code in Python.

  6. Using R from within Python « DECISION STATS Says:

    […] pyhi: Python Package/Module Hello World with all the Trimmings (lingpipe-blog.com) […]

  7. Nikhil Chelliah Says:

    For Python code, emacs lets you use C-c > and C-c < to manually indent and dedent the seleted region. This ends up being almost as convenient as the customary C-M-\ for other languages.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 824 other followers