Automatic MT Evaluation

Reference Translations

The following picture^[1] illustrates the issue of reference translations:

Out of all possible sequences of words in the given language, only some are grammatically correct sentences ( $G$ ). An overlapping set is formed by understandable translations ( $T$ ) of the source sentence (note that these are not necessarily grammatical). Possible reference translations can then be viewed as a subset of $G\cap T$ . Only some of these can be reached by the MT system. Typically, we only have several reference translations at our disposal; often we have just a single reference.

PER

Position-independent error rate^[2] (PER) is a simple measure which counts the number of words which are identical in the MT output and the reference translation and divides

BLEU

^[3]

References

↑ Ondřej Bojar, Matouš Macháček, Aleš Tamchyna, Daniel Zeman. Scratching the Surface of Possible Translations
↑ C. Tillmann, S. Vogel, H. Ney, A. Zubiaga, H. Sawaf. Accelerated DP Based Search for Statistical Translation
↑ Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation

[deprefset-1] Ondřej Bojar, Matouš Macháček, Aleš Tamchyna, Daniel Zeman. Scratching the Surface of Possible Translations

[per-2] C. Tillmann, S. Vogel, H. Ney, A. Zubiaga, H. Sawaf. Accelerated DP Based Search for Statistical Translation

[bleu-3] Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation

[1]

[2]

[3]

Automatic MT Evaluation

Contents

Reference Translations

PER

BLEU

References

Navigation menu

Lecture 4: Automatic MT Evaluation

Lecture video:	web TODO Youtube

Automatic MT Evaluation

Reference Translations

PER

BLEU

References

Navigation menu

Search