
Mitzi and I were talking and she said she loved … “corn chips”. Hmm. I was expecting “cheese”, which is what she’s usually eating in the other room at this time. So I was primed to think “Wallace and Gromit”. But I couldn’t remember which of the pair, Wallace or Gromit was which. I remember the characters. One’s a head-in-the-clouds human cheese lover, the other a clever pooch.
Back when I worked on semantics, this is just the kind of data that’d get me very excited. Why? Because it’s inconsistent with theories of reference, like the one I put forward in my semantics book.
My theory of plurals would have it that to understand a conjunction like “Wallace and Gromit”, you first identified a referent for each of the conjuncts, “Wallace” and “Gromit”, which could then be used to pick out a group.
In this case, I know the two members of the group, I just don’t know which had which name.
But maybe “Wallace and Gromit” as a whole is a name. That is, maybe it’s a frozen expression, at least for me. Lots of names are like that for me, like “Johnson and Johnson”. Speaking of Johnson and Johnson, could they be just one person? That is, could both conjuncts refer to the same person? It probably mostly refers to a company as a fixed expression these days.
At one point, “Johnson and Johnson” would’ve caused confusion for named entity detectors (conjunction of two person names, or a combined company name; annotation standards like Genia’s Technical Term Annotation let you keep both). This is a problem for us now in our high recall entity extraction with terms like “insulin receptor” — is that a reference to insulin (one thing), or the receptor (another thing)?

Mitzi’s a virtual font of referential uncertainty data tonight. She said she knew that “Abbott and Costello” (the comedy team) had first names “Lou” and “Bud”, but she didn’t know which went with which last name (hint: that’s Lou Costello on the left and Bud Abbott on the right).
March 26, 2009 at 9:39 pm |
I can give another interesting example of a conjunction from the documents of the United Nations: “draft resolution A/56/L.28 and Add.1”. Notice the singular “resolution”, even though we obviously have two documents here.
I would love to see your take on this expression. I wrote down my understanding of it on my own blog: http://blog.outerthoughts.com/2009/03/conjunctions-in-named-entities/
March 27, 2009 at 8:56 am |
It’s not at all obvious to me that “draft resolution A/56/L.28 and Add.1″ is two documents. This is because the physical form of the documents is
not necessarily the focus of the discussion. Instead, what matters is that
there is a resolution under consideration. There may be two sheets of
paper, but everyone knows that there is just one thing you could vote on.
Domain knowledge says that Add.1 is probably an addendum
or addition. But Add.1 could easily be something that is totally dependent on
the content of A/56/L.28, like:
“remove Wensleydale from the list of cheeses in paragraph 4”
In that case, the effective draft resolution is indeed A/56/L.28 and Add.1, and
it would be coherent for people to say stuff like “I would be in favour of A/56/L.28, which isn’t on the table, but must vote against A/56/L.28 and Add.1, which is “. As usual, the deal with named entities is that people are using language to talk about the things they care about, and there is some slop.
The question of how a bureaucracy formulates and uses the rules for building resolutions and voting on them is probably a research topic for someone, but might be a tough sell as a contribution to linguistic semantics. My sense is that there are two semi-formal systems operating here: the linguistic system that defines how the mapping to meaning usually works and the legal framework that explains how you resolve disputes what was voted on. There are no strong reasons to expect that these will align exactly. To understand a usage, however, you will sometimes have to consider both.
OK. Now I’ll read what Alexandre says
March 27, 2009 at 9:04 am |
After reading Alexandre, I see that simpler way of saying what I just said is
They could have written
draft resolution A/56/L.28 as modified by Add.1
instead of
draft resolution A/56/L.28 and Add.1
but they didn’t have to, because shared knowledge of the pragmatics of the
situation gives “and” the right force anyway.
March 28, 2009 at 3:05 pm |
Thanks Chris,
My interest and comment was in the context of NE detection, which is what the original article was about. From that point of view, I suspect any current algorithm would treat the text as two references and any one->two mapping would be non-trivial on or post co-reference stage.
March 28, 2009 at 11:57 pm |
Well, I thought Bob’s article pointed out that even a well-fleshed out
formal theory of semantics misses some tricks on examples with Gromit
indeterminacy. So I guess it resonates differently for different people.
This may be absurdly old-school, but it seems to me that to have a named entity to detect in the first place you need three things.
– a name (we can safely assume that this is a subsequence of the document
it is in)
– an entity (what’s that?)
– the appropriate relation between the name and the entity (what’s that?)
Thus, in my view, any system that does named entity detection is in effect taking some
kind of position on the second two aspects. Either it works with an explicit
theory of these things (in which case, don’t forget to consult Montague and his successors), or it’s theory is implicit, and must be deduced from its
behaviour before it can be evaluated. In either case the question of exactly what we want from an NE detector seems worthwhile. And that, I expect, will
be deeply and irrevocably application dependent.
March 29, 2009 at 1:14 pm |
I started writing the blog post with old-school referential semantics in mind. By “old school”, I mean the kind of semantics that would’ve been familiar to Bertrand Russell.
It only dawned on me as I was writing that we’re facing many of the same problems in our named-entity detection and database linkage, which are just a kind of syntax and semantics.
The tie-in is that developing an annotation standard for a machine learning task is tantamount to developing a traditional kind of syntactic and semantic theory.
I’ve been meaning to write a longer post on this topic — corpus annotation coding standards are the new home of the kind of linguistics I always liked, which is more example driven than theory driven.
I didn’t even go into the fact that Wallace and Gromit are fictional, which in itself presents a non-trivial semantic puzzle.
[Punctuation and Semantics Exercise: Explain the alternate hyphenations of “old school” and the quotes in the first paragraph of this response.]
April 1, 2009 at 12:01 am |
That’s Lou Costello on the left.
April 1, 2009 at 10:32 am |
Doh! That is Lou Costello on the left. Pardon me while I go change the original post.