Scoring and Optimization: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 12: | Line 12: | ||
=== Lexical Weights === | === Lexical Weights === | ||
Lexical weights are a method for smoothing the phrase table. Infrequent phrases have unreliable | |||
probability estimates; for instance many long phrases occur together only once | |||
in the corpus, resulting in <math>P(\mathbf{e}|\mathbf{f}) = P(\mathbf{f}|\mathbf{e}) | |||
= 1</math>. Several methods exist for computing lexical weights. The most common one | |||
is based on word alignment inside the phrase \citep{koehn:phd-thesis}. The | |||
probability of each \emph{foreign} word <math>f_j</math> is estimated as the average of | |||
lexical translation probabilities <math>w(f_j, e_i)</math> over the English words aligned | |||
to it. Thus for the phrase <math>(\mathbf{e},\mathbf{f})</math> with the set of alignment | |||
points <math>a</math>, the lexical weight is: | |||
<math> | |||
\text{lex}(\mathbf{f}|\mathbf{e},a) = \prod_{j=1}^{l_f} | |||
\frac{1}{|{i|(i,j) \in a}|} \sum_{\forall(i,j) \in a}w(f_j, e_i) | |||
</math> | |||
=== Language Model === | === Language Model === |
Revision as of 14:52, 24 August 2015
Lecture video: |
web TODO Youtube |
---|
{{#ev:youtube|https://www.youtube.com/watch?v=rDkZOINdPhw&index=11&list=PLpiLOsNLsfmbeH-b865BwfH15W0sat02V%7C800%7Ccenter}}
Features of MT Models
Phrase Translation Probabilities
Lexical Weights
Lexical weights are a method for smoothing the phrase table. Infrequent phrases have unreliable probability estimates; for instance many long phrases occur together only once in the corpus, resulting in . Several methods exist for computing lexical weights. The most common one is based on word alignment inside the phrase \citep{koehn:phd-thesis}. The probability of each \emph{foreign} word is estimated as the average of lexical translation probabilities over the English words aligned to it. Thus for the phrase with the set of alignment points , the lexical weight is: