MT Evaluation in General: Difference between revisions
		
		
		
		Jump to navigation
		Jump to search
		
|  (Created page with "{{Infobox |title = Lecture 4: General MT Evaluation |image = 200px |label1 = Lecture video: |data1 = [http://example.com web '''TODO'''] <br/> [http://www....") | No edit summary | ||
| Line 8: | Line 8: | ||
| {{#ev:youtube|_QL-BUxIIhU|800|center}} | {{#ev:youtube|_QL-BUxIIhU|800|center}} | ||
| == Data Splits == | |||
| Available training data is usually split into several parts, e.g. **training**, **development** (held-out) and **(dev-)test**. Training data is used to estimate model parameters, development set can be used for model selection, hyperparameter tuning etc. and dev-test is used for continuous evaluation of progress (are we doing better than before?). | |||
| However, you should always keep an additional **(final) test set** which is used only very rarely. Evaluating your system on the final test set can then be used as a rough estimate of its true performance because you do not use it in the development process at all, and therefore do not bias your system towards it. | |||
| The "golden rule" of (MT) evaluation: **Evaluate on unseen data!** | |||
Revision as of 14:10, 26 January 2015
|  | |
| Lecture video: | web TODO Youtube | 
|---|---|
Data Splits
Available training data is usually split into several parts, e.g. **training**, **development** (held-out) and **(dev-)test**. Training data is used to estimate model parameters, development set can be used for model selection, hyperparameter tuning etc. and dev-test is used for continuous evaluation of progress (are we doing better than before?).
However, you should always keep an additional **(final) test set** which is used only very rarely. Evaluating your system on the final test set can then be used as a rough estimate of its true performance because you do not use it in the development process at all, and therefore do not bias your system towards it.
The "golden rule" of (MT) evaluation: **Evaluate on unseen data!**