Joint Referential (Un)Certainty: The “Wallace and Gromit” Dilemma

by
Wallace in bed with Cheese

Mitzi and I were talking and she said she loved … “corn chips”. Hmm. I was expecting “cheese”, which is what she’s usually eating in the other room at this time. So I was primed to think “Wallace and Gromit”. But I couldn’t remember which of the pair, Wallace or Gromit was which. I remember the characters. One’s a head-in-the-clouds human cheese lover, the other a clever pooch.

Back when I worked on semantics, this is just the kind of data that’d get me very excited. Why? Because it’s inconsistent with theories of reference, like the one I put forward in my semantics book.

My theory of plurals would have it that to understand a conjunction like “Wallace and Gromit”, you first identified a referent for each of the conjuncts, “Wallace” and “Gromit”, which could then be used to pick out a group.

In this case, I know the two members of the group, I just don’t know which had which name.

But maybe “Wallace and Gromit” as a whole is a name. That is, maybe it’s a frozen expression, at least for me. Lots of names are like that for me, like “Johnson and Johnson”. Speaking of Johnson and Johnson, could they be just one person? That is, could both conjuncts refer to the same person? It probably mostly refers to a company as a fixed expression these days.

At one point, “Johnson and Johnson” would’ve caused confusion for named entity detectors (conjunction of two person names, or a combined company name; annotation standards like Genia’s Technical Term Annotation let you keep both). This is a problem for us now in our high recall entity extraction with terms like “insulin receptor” — is that a reference to insulin (one thing), or the receptor (another thing)?

Mitzi’s a virtual font of referential uncertainty data tonight. She said she knew that “Abbott and Costello” (the comedy team) had first names “Lou” and “Bud”, but she didn’t know which went with which last name (hint: that’s Lou Costello on the left and Bud Abbott on the right).

8 Responses to “Joint Referential (Un)Certainty: The “Wallace and Gromit” Dilemma”

  1. Alexandre Rafalovitch Says:

    I can give another interesting example of a conjunction from the documents of the United Nations: “draft resolution A/56/L.28 and Add.1″. Notice the singular “resolution”, even though we obviously have two documents here.

    I would love to see your take on this expression. I wrote down my understanding of it on my own blog: http://blog.outerthoughts.com/2009/03/conjunctions-in-named-entities/

  2. Chris Brew Says:

    It’s not at all obvious to me that “draft resolution A/56/L.28 and Add.1″ is two documents. This is because the physical form of the documents is
    not necessarily the focus of the discussion. Instead, what matters is that
    there is a resolution under consideration. There may be two sheets of
    paper, but everyone knows that there is just one thing you could vote on.

    Domain knowledge says that Add.1 is probably an addendum
    or addition. But Add.1 could easily be something that is totally dependent on
    the content of A/56/L.28, like:

    “remove Wensleydale from the list of cheeses in paragraph 4″

    In that case, the effective draft resolution is indeed A/56/L.28 and Add.1, and
    it would be coherent for people to say stuff like “I would be in favour of A/56/L.28, which isn’t on the table, but must vote against A/56/L.28 and Add.1, which is “. As usual, the deal with named entities is that people are using language to talk about the things they care about, and there is some slop.

    The question of how a bureaucracy formulates and uses the rules for building resolutions and voting on them is probably a research topic for someone, but might be a tough sell as a contribution to linguistic semantics. My sense is that there are two semi-formal systems operating here: the linguistic system that defines how the mapping to meaning usually works and the legal framework that explains how you resolve disputes what was voted on. There are no strong reasons to expect that these will align exactly. To understand a usage, however, you will sometimes have to consider both.

    OK. Now I’ll read what Alexandre says

  3. Chris Brew Says:

    After reading Alexandre, I see that simpler way of saying what I just said is

    They could have written

    draft resolution A/56/L.28 as modified by Add.1

    instead of

    draft resolution A/56/L.28 and Add.1

    but they didn’t have to, because shared knowledge of the pragmatics of the
    situation gives “and” the right force anyway.

  4. Alexandre Rafalovitch Says:

    Thanks Chris,

    My interest and comment was in the context of NE detection, which is what the original article was about. From that point of view, I suspect any current algorithm would treat the text as two references and any one->two mapping would be non-trivial on or post co-reference stage.

  5. Chris Brew Says:

    Well, I thought Bob’s article pointed out that even a well-fleshed out
    formal theory of semantics misses some tricks on examples with Gromit
    indeterminacy. So I guess it resonates differently for different people.

    This may be absurdly old-school, but it seems to me that to have a named entity to detect in the first place you need three things.

    – a name (we can safely assume that this is a subsequence of the document
    it is in)
    – an entity (what’s that?)
    – the appropriate relation between the name and the entity (what’s that?)

    Thus, in my view, any system that does named entity detection is in effect taking some
    kind of position on the second two aspects. Either it works with an explicit
    theory of these things (in which case, don’t forget to consult Montague and his successors), or it’s theory is implicit, and must be deduced from its
    behaviour before it can be evaluated. In either case the question of exactly what we want from an NE detector seems worthwhile. And that, I expect, will
    be deeply and irrevocably application dependent.

  6. lingpipe Says:

    I started writing the blog post with old-school referential semantics in mind. By “old school”, I mean the kind of semantics that would’ve been familiar to Bertrand Russell.

    It only dawned on me as I was writing that we’re facing many of the same problems in our named-entity detection and database linkage, which are just a kind of syntax and semantics.

    The tie-in is that developing an annotation standard for a machine learning task is tantamount to developing a traditional kind of syntactic and semantic theory.

    I’ve been meaning to write a longer post on this topic — corpus annotation coding standards are the new home of the kind of linguistics I always liked, which is more example driven than theory driven.

    I didn’t even go into the fact that Wallace and Gromit are fictional, which in itself presents a non-trivial semantic puzzle.

    [Punctuation and Semantics Exercise: Explain the alternate hyphenations of “old school” and the quotes in the first paragraph of this response.]

  7. Mark Dominus Says:

    That’s Lou Costello on the left.

  8. lingpipe Says:

    Doh! That is Lou Costello on the left. Pardon me while I go change the original post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 824 other followers