Scoring and Optimization: Difference between revisions

Revision as of 08:03, 25 August 2015

Features of MT Models

Phrase Translation Probabilities

Phrase translation probabilities are calculated from occurrences of phrase pairs extracted from the parallel training data. Usually, MT systems work with the following two conditional probabilities:

$P(\mathbf {e} |\mathbf {f} )$
$P(\mathbf {f} |\mathbf {e} )$

These probabilities are estimated by simply counting how many times (for the first formula) we saw $\mathbf {e}$ aligned to $\mathbf {f}$ and how many times we saw $\mathbf {f}$ in total. For example, based on the following excerpt from (sorted) extracted phrase pairs, we estimate that $P({\text{naznačena v programu}}|{\text{estimated in the programme}})=3/9$ .

estimated in the programme ||| naznačena v programu
estimated in the programme ||| naznačena v programu
estimated in the programme ||| naznačena v programu
estimated in the programme ||| odhadován v programu
estimated in the programme ||| odhadovány v programu
estimated in the programme ||| odhadovány v programu 
estimated in the programme ||| předpokládal program
estimated in the programme ||| v programu uvedeným
estimated in the programme ||| v programu uvedeným

Lexical Weights

Lexical weights are a method for smoothing the phrase table. Infrequent phrases have unreliable probability estimates; for instance many long phrases occur together only once in the corpus, resulting in $P(\mathbf {e} |\mathbf {f} )=P(\mathbf {f} |\mathbf {e} )=1$ . Several methods exist for computing lexical weights. The most common one is based on word alignment inside the phrase. The probability of each foreign word $f_{j}$ is estimated as the average of lexical translation probabilities $w(f_{j},e_{i})$ over the English words aligned to it. Thus for the phrase $(\mathbf {e} ,\mathbf {f} )$ with the set of alignment points $a$ , the lexical weight is:

${\text{lex}}(\mathbf {f} |\mathbf {e} ,a)=\prod _{j=1}^{l_{f}}{\frac {1}{|{i|(i,j)\in a}|}}\sum _{\forall (i,j)\in a}w(f_{j},e_{i})$

Language Model

https://www.coursera.org/course/nlp

https://www.youtube.com/playlist?list=PLaRKlIqjjguC-20Glu7XVAXm6Bd6Gs7Qi

Word and Phrase Penalty

Distortion Penalty

Decoding

Phrase-Based Search

Decoding in SCFG

Optimization of Feature Weights

Note that there have even been shared tasks in model optimization. One, by invitation only, in 2011 and one in 2015: WMT15 Tuning Task.

Revision as of 15:28, 24 August 2015 (view source) Tamchyna (talk \| contribs) No edit summary ← Older edit		Revision as of 08:03, 25 August 2015 (view source) Bojar (talk \| contribs) (→‎Optimization of Feature Weights: links to tuning tasks) Newer edit →
Line 62:		Line 62:

	== Optimization of Feature Weights ==		== Optimization of Feature Weights ==

			Note that there have even been shared tasks in model optimization. One, by invitation only, in [http://www.statmt.org/wmt11/tunable-metrics-task.html 2011] and one in 2015: [http://www.statmt.org/wmt15/tuning-task/ WMT15 Tuning Task].