MT that Deceives

From MT Talks
Revision as of 15:14, 5 November 2014 by Tamchyna (talk | contribs)
Jump to navigation Jump to search

Many popular MT systems, such as Google Translate or Bing Translator (for certain languages), are based purely on statistical models. Such models observe word and phrase co-occurrences in parallel texts and try to learn translation equivalents.

Example of an error during phrase extraction. The system learns a translation pair "nemám" = "I have" which has the opposite meaning.

In some cases, this approach leads to systematic errors. The picture illustrates a common issue with negation -- in many languages (such as Czech), negation is expressed by a prefix ("ne" in this case). Moreover, Czech uses double negatives: the sentence Nemám žádnou kočku. corresponds to English I_do_not_have no cat. word by word. Therefore the automatic procedure learns a wrong translation rule I have=nemám. Whenever this rule is applied, the meaning of the translation is completely reversed.

Other examples of notorious errors include named entities, such as:

Jan Novák potkal Karla Poláka. -> John Smith met Charles Pole.