Automatic MT Evaluation

From MT Talks
Revision as of 17:27, 9 February 2015 by Tamchyna (talk | contribs) (Created page with "{{Infobox |title = Lecture 4: Automatic MT Evaluation |image = 200px |label1 = Lecture video: |data1 = [http://example.com web '''TODO'''] <br/> [https://ww...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Lecture 4: Automatic MT Evaluation
Lecture video: web TODO
Youtube

{{#ev:youtube|https://www.youtube.com/watch?v=Bj_Hxi91GUM&index=5&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V%7C800%7Ccenter}}


An inherent issue with MT evaluation is the fact that there is usually more than one correct translation. In fact, several experiments[1][2] show that there can be as many as hundreds of thousands or even millions of correct translations per a single sentence.

Such a high number of possible translations is mainly caused by the flexibility of lexical choice and word order. (In our example, the German word "Arbeiter" can be translated into English as "worker" or "employee".) Every such decision multiplies the number of translations, which thus grows exponentially.

Despite this fact, when we train or evaluate translation systems, we often rely on just a single reference translation.

Translation Evaluation Campaigns

There are several academic workshops where the quality of various translation systems is compared. Such "competitions" require manual evaluation. Their methodology evolves to make the results as fair and statistically sound as possible. The most prominent ones include:

Workshop on Statistical Machine Translation (WMT)

International Workshop on Spoken Language Translation (IWSLT)

References

  1. Ondřej Bojar, Matouš Machaček, Aleš Tamchyna, Daniel Zeman. Scratching the Surface of Possible Translations
  2. Markus Dreyer, Daniel Marcu. HyTER: Meaning-Equivalent Semantics for Translation Evaluation