[This post repeats a long comment I posted about licensing in response to Brendan O'Connor's blog entry, End-to-End NLP Packages. Brendan's post goes over some packages for NLP and singles out LingPipe as being only “quasi free.”]
Restrictive “Academic-Only” Licenses
Some of those other packages, like C&C Tools and Senna, are in the same “quasi free” category as LingPipe in the sense that they’re released under what their authors call “non-commercial” licenses. For instance, none of the Senna, C&C, or LingPipe licenses are compatible with GPL-ed code. Senna goes so far as to prohibit derived works altogether.
The LingPipe License
The intent for the
was a little different from the “academic use only” licenses in that we didn’t single out academia as a special class of users. We do allow free use for research purposes for industrialists and academics alike. We also provide a “developers” license that explicitly gives you this right, which makes some users’ organizations feel better.
Truly Free NLP Software
The other tools, like NLTK, Mallet, OpenNLP, and GATE are released under more flexible licenses (LGPL, Apache or BSD), which I really do think of as being truly “free”. Mahout’s also in this category, though not mentioned by Brendan, whereas packages like TreeTagger are more like Senna or C&C in their restrictive “academic only” licensing.
Stanford and the GPL
Stanford NLP’s license sounds like it was written by someone who didn’t quite understand the GPL. Their page says (the link is also theirs):
The Stanford CoreNLP code is licensed under the full GPL, which allows its use for research purposes, free software projects, software services, etc., but not in distributed proprietary software.
Technically, what they say is true. It would’ve been clearer if they’d replaced “research” with “research and non-research” and “free” with “free and for-profit”. Instead, their choice of examples suggests “free” or “research” have some special status under the GPL, which they don’t. With my linguist hat on, I’d say their text leads the reader to a false implicature. The terms “research” and “academia” don’t even show up in the GPL, and although “free” does, GNU and others clarify this usage elswewhere as “free as in free speech”, not “free as in free beer”.
Understanding the GPL
The key to understanding the GPL lies behind Stanford’s embedded link to
Here, proprietary doesn’t have to do with ownership, but rather with closed source. Basically, if you redistribute source code or an application based on GPL-ed code, you have to also release your code under the GPL, which is why it’s called a “copyleft” or “viral” license. In some cases, you can get away with using a less restrictive license like LGPL or BSD for your mods or interacting libraries, though you can’t change the underlying GPL-ed source’s license.
GPL Applies to Academics, Too
There’s no free ride for academics here — you can’t take GPL-ed code, use it to build a research project for your thesis, then give an executable away for free without also distributing your code with a compatible license. And you can’t restrict the license to something research only. Similarly, you couldn’t roll a GPL-ed library into Senna or C&C or LingPipe and redistribute them under their own licenses. Academics are often violating these terms because they somehow think “research use only” is special.
Services Based on GPL-ed Software and the AGPL
You can also set up a software service, for example on Amazon’s Elastic Compute Cloud (EC2) or on your own servers, that’s entirely driven by GPL-ed software, like say Stanford NLP or Weka, and then charge users for accessing it. Because you’re not redistributing the software itself, you can modify it any way you like and write code around it without releasing your own software. GNU introduced the Affero GPL (AGPL), a license even more restrictive than the GPL that tries to close this server loophole for the basic GPL.
Charging for GPL-ed Code
You can charge for GPL-ed code if you can find someone to pay you. That’s what RedHat’s doing with Linux, what Revolution R’s doing with R, and what Enthought’s doing with Python.
LingPipe’s Business Model is Like MySQL’s
Note that this is not what MySQL did with MySQL (before they sold it to Oracle) nor is it what we do with LingPipe. In both those cases, the company owns all the intellectual property and copyrights and thus is able to release the code under multiple licenses. This strategy’s explained on the
- Wikipedia: Multi-Licensing
We license LingPipe under custom licenses as well as our royalty-free license. These licenses include all sorts of additional restrictions (like only using some of the modules on so many servers) and additional guarantees (like indemnification and maintenance); don’t ask me about the details — that’s Breck’s bailiwick. Suffice it to say most companies don’t like to get involved with copyleft, be it from GPL or LingPipe’s royalty-free license. So we let them pay us extra and get an unencumbered license so they can do what they want with LingPipe and not have to share their code. We’ve had more than one customer buy commercial license for LingPipe who wouldn’t even tell us what they were going to do with our software.
Free “Academic” Software
Also, keep in mind that as an academic, your university (or lab) probably has a claim to your intellectual property developed using their resources. Here’s some advice from GNU on that front: