Scoring and Optimization

{{#ev:youtube|https://www.youtube.com/watch?v=rDkZOINdPhw&index=11&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V%7C800%7Ccenter}}

Features of MT Models

Phrase Translation Probabilities

Lexical Weights

Lexical weights are a method for smoothing the phrase table. Infrequent phrases have unreliable probability estimates; for instance many long phrases occur together only once in the corpus, resulting in Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(\mathbf{e}|\mathbf{f}) = P(\mathbf{f}|\mathbf{e}) = 1} . Several methods exist for computing lexical weights. The most common one is based on word alignment inside the phrase \citep{koehn:phd-thesis}. The probability of each \emph{foreign} word Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f_j} is estimated as the average of lexical translation probabilities Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle w(f_j, e_i)} over the English words aligned to it. Thus for the phrase Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (\mathbf{e},\mathbf{f})} with the set of alignment points Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a} , the lexical weight is:

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{lex}(\mathbf{f}|\mathbf{e},a) = \prod_{j=1}^{l_f} \frac{1}{|{i|(i,j) \in a}|} \sum_{\forall(i,j) \in a}w(f_j, e_i) }

Language Model

Word and Phrase Penalty

Distortion Penalty

Decoding

Phrase-Based Search

Decoding in SCFG

Optimization of Feature Weights