MT Evaluation in General: Difference between revisions
No edit summary |
|||
Line 18: | Line 18: | ||
== Example Sentence + Translations == | == Example Sentence + Translations == | ||
Original German sentence: | Original German sentence: | ||
English reference translation: ''Worker falls from ladder: seriously injured'' | ''Arbeiter sturzte von Leiter: schwer verletzt'' | ||
English reference translation: | |||
''Worker falls from ladder: seriously injured'' | |||
{| | {| | ||
! MT Output | ! MT Output | ||
! Notes | ! Notes | ||
- | |- | ||
| Workers rushed from director: Seriously injured | | Workers rushed from director: Seriously injured | ||
| plural (workers), bad choice of verb (rushed), ''Leiter'' mistranslated as ''director'' | | plural (workers), bad choice of verb (rushed), ''Leiter'' mistranslated as ''director'' | ||
- | |- | ||
| Workers fell from ladder: hurt | | Workers fell from ladder: hurt | ||
| plural (workers), intensifier missing | | plural (workers), intensifier missing | ||
- | |- | ||
| Worker rushed from ladder: schwer verletzt | | Worker rushed from ladder: schwer verletzt | ||
| bad choice of verb (rushed), tail is left untranslated | | bad choice of verb (rushed), tail is left untranslated | ||
- | |- | ||
| Worker fell from leader: heavily injures | | Worker fell from leader: heavily injures | ||
| ''Leiter'' translated as ''leader'' (not a typo, a bad lexical choice), poor morphological choices | | ''Leiter'' translated as ''leader'' (not a typo, a bad lexical choice), poor morphological choices | ||
|} | |} |
Revision as of 14:30, 26 January 2015
Lecture video: |
web TODO Youtube |
---|
{{#ev:youtube|_QL-BUxIIhU|800|center}}
Data Splits
Available training data is usually split into several parts, e.g. training, development (held-out) and (dev-)test. Training data is used to estimate model parameters, development set can be used for model selection, hyperparameter tuning etc. and dev-test is used for continuous evaluation of progress (are we doing better than before?).
However, you should always keep an additional (final) test set which is used only very rarely. Evaluating your system on the final test set can then be used as a rough estimate of its true performance because you do not use it in the development process at all, and therefore do not bias your system towards it.
The "golden rule" of (MT) evaluation: Evaluate on unseen data!
Example Sentence + Translations
Original German sentence:
Arbeiter sturzte von Leiter: schwer verletzt
English reference translation:
Worker falls from ladder: seriously injured
MT Output | Notes |
---|---|
Workers rushed from director: Seriously injured | plural (workers), bad choice of verb (rushed), Leiter mistranslated as director |
Workers fell from ladder: hurt | plural (workers), intensifier missing |
Worker rushed from ladder: schwer verletzt | bad choice of verb (rushed), tail is left untranslated |
Worker fell from leader: heavily injures | Leiter translated as leader (not a typo, a bad lexical choice), poor morphological choices |