Automatic MT Evaluation

{{#ev:youtube|https://www.youtube.com/watch?v=Bj_Hxi91GUM&index=5&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V%7C800%7Ccenter}}

An inherent issue with MT evaluation is the fact that there is usually more than one correct translation. In fact, several experiments^[1]^[2] show that there can be as many as hundreds of thousands or even millions of correct translations per a single sentence.

Such a high number of possible translations is mainly caused by the flexibility of lexical choice and word order. (In our example, the German word "Arbeiter" can be translated into English as "worker" or "employee".) Every such decision multiplies the number of translations, which thus grows exponentially.

Despite this fact, when we train or evaluate translation systems, we often rely on just a single reference translation.

Translation Evaluation Campaigns

There are several academic workshops where the quality of various translation systems is compared. Such "competitions" require manual evaluation. Their methodology evolves to make the results as fair and statistically sound as possible. The most prominent ones include:

Workshop on Statistical Machine Translation (WMT)

International Workshop on Spoken Language Translation (IWSLT)

References

↑ Ondřej Bojar, Matouš Machaček, Aleš Tamchyna, Daniel Zeman. Scratching the Surface of Possible Translations
↑ Markus Dreyer, Daniel Marcu. HyTER: Meaning-Equivalent Semantics for Translation Evaluation

[deprefset-1] Ondřej Bojar, Matouš Machaček, Aleš Tamchyna, Daniel Zeman. Scratching the Surface of Possible Translations

[hyter-2] Markus Dreyer, Daniel Marcu. HyTER: Meaning-Equivalent Semantics for Translation Evaluation

[1]

[2]

Lecture 4: Automatic MT Evaluation

Lecture video:	web TODO Youtube

Automatic MT Evaluation

Translation Evaluation Campaigns

References

Navigation menu

Search