Drunk Bavarian Corpus


I was amused by ELRA‘s release of:

The description says “ALC contains recordings of speakers that are either intoxicated or sober.” Is there a third kind of person?

I assume they’re serious, becuase none of the posts are dated April 1.

The price is 1000 euros give or take plus VAT; ELRA jitters their prices, so the real price is 1020 euros plus VAT, though you might see if they’ll take 1019.99.

There’s even a sample labeled recording, headset, intoxicated

Here’s the README, including ASCII art logo.

Or you can read the paper:

What’s next, the Drunk American College Student Corpus from LDC?

5 Responses to “Drunk Bavarian Corpus”

  1. Ken Says:

    Would be more fun to make your own.

  2. Brendan O'Connor Says:

    I bet higher BAC correlates to higher language model likelihoods: simpler language is easier to predict.

    • lingpipe Says:

      Ah, but more clearly articulated language is easier to understand bottom-up, and with HMM-type speech recognizers, it’s all about the p(words|acoustics) being proportional to p(words) * p(acoustics|words). That is, a mix of the top-down predictions in the form of a language model p(words), and bottom-up predictions in the form of an acoustic model p(acoustics|words).

      Perhaps the real point is that the acoustic models, even the fairly large Gaussian mixture models for p(acoustics|words), are closely fitted to their acoustic environments and population of speakers.

      To make the task really challenging, they’d have done the recordings in a Biergarten.

  3. woo Says:

    This is relevant for forensic speaker recognition applications. For previous studies on this topic, see for example

    Künzel, HJ; Braun, A; Eysholdt, U (1992) “Einfluss von Alkohol auf Sprache und Stimme” (Influence of alcohol on language and voice). Kriminalistik Verlag

    In the forensic context it’s important to know to what extent common forensic phonetic parameters differ with respect to normal un-influenced speech.

  4. Jochen L. Leidner Says:

    Incidentally the Bavarian speech archive BAS that curated this corpus is just around the corner from a wonderful Munich beer garden…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: