Positive, Negative and Neutral Sentiment

by

“Classical” sentiment analysis, as defined in Pang and Lee’s seminal experiments, classifies reviews into two categories: positive and negative. Pang and Lee created training data from Rotten Tomatoes reviews, which are published with stars. Their data consisted of two classes representing negative and positive reviews. Neutral reviews (those getting 3/5 stars) were not included in the data set, and thus the resulting systems aren’t tested on their ability to reject neutral reviews as being neither positive nor negative.

Here’s what I’ve been suggesting: use two classifiers, one for positive/non-positive and one for negative/non-negative. Then you get a 4-way classification into positive (+pos,-neg), negative (-pos,+neg), mixed (+pos,+neg) and neutral (-pos,-neg). The problem here is that I need data to try it out. This level of annotation can’t be extracted from review text plus star rating. I need to know sentence-by-sentence sentiment.

Someone just forwarded me a pointer to a 2005 IJCAI paper that actually takes neutral sentiments seriously:

M. Koppel and J. Schler (2005) Using Neutral Examples for Learning Polarity. In IJCAI.

What Koppel and Schler built was a three-way classifier for positive, negative and neutral sentiment. They do it by combining binary classifiers into n-way classifiers using a standard round-robin approach. Specifically, they build three binary classifiers: positive/negative, positive/neutral, and negative/neutral. Then they run all three and take a vote (with some tie-breaking scheme). They use SVMs, but any binary classifier may be used.

This approach is generalizable. You can slice and dice the categories differently. What I suggested was actually combining two classifiers each of which would be trained on all the data, namely positive/neutral+negative and negative/neutral+positive. You can go further and fully expand all the combinations, adding positive+negative/neutral to round out the set of six binary classification problems. You can just add the larger categories into the vote.

On another note, I’d actually like to get the time to take my proposed approach and build it into a hierarchical model, like McDonald et al.’s hierarchical SVM-based approach. I’d use something like latent Dirichlet analysis (coming soon to LingPipe) instead of SVMs, so I could predict posterior probabilities, but that’s a relatively trifling detail compared to the overall structure of the model. It would actually be possible to partially supervise LDA, or the whole model could be induced as latent structure from the top-level review ratings. Even more fun would ensue if we could use a kind of hierarchical LDA, with a level dedicated to overall sentiment and then another level per genre/domain (this’d be hierarchical on the word distributions, not the topic distributions as in standard hierarchical LDA).

One Response to “Positive, Negative and Neutral Sentiment”

  1. Craig Macdonald Says:

    You might be interested to know that TREC have been running an opinion finding task in the Blog track since TREC 2006, and the research outcomes of this track are beginning to appear in IR research conferences.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s