Phrase-based Model: Difference between revisions
No edit summary |
No edit summary |
||
Line 25: | Line 25: | ||
[[File:phrase-extraction-short.png|400px]] [[File:phrase-extraction-long.png|400px]] | [[File:phrase-extraction-short.png|400px]] [[File:phrase-extraction-long.png|400px]] | ||
In practice, only phrases up to a certain length are extracted (e.g. 7 words). Longer phrases would hardly ever be used by the translation model (unless it was presented with a sentence from the training data) and the phrase table would be extremely large. | |||
== Phrase Scoring == | == Phrase Scoring == | ||
Line 40: | Line 42: | ||
== Decoding == | == Decoding == | ||
When we get an input sentence for translation, the first step is to look up '''translation options''' (possible translations) for each source span in the phrase table. These can be thought of as jigsaw puzzle pieces which are combined to get as good final translation as possible. The search for this best combination (i.e. the most probable translation according to the model) is usually called '''decoding'''. | |||
== See Also == | == See Also == |
Revision as of 15:31, 7 April 2015
Lecture video: |
web TODO Youtube |
---|
{{#ev:youtube|https://www.youtube.com/watch?v=aA4jFayPNeQ%7C800%7Ccenter}}
Phrase-based machine translation (PBMT) is probably the most widely used approach to MT today. It is relatively simple and easy to adapt to new languages.
Phrase Extraction
PBMT uses phrases as the basic unit of translation. Phrases are simply contiguous sequences of words which have been observed in the training data, they don't correspond to any linguistic notion of phrases.
In order to obtain a phrase table (a probabilistic dictionary of phrases), we need word-aligned parallel data. Using the alignment links, a simple heuristic is applied to extract consistent phrase pairs. Consider the word-aligned example sentence:
Phrase pairs are contiguous spans where all alignment points from the source side of the span fall within its target side and vice versa. These are examples of phrases consistent with this word alignment:
On the other hand, if either a source word or a target word is aligned outside of the current span, the phrase cannot be extracted. The conflicting alignment points are drawn in yellow:
In practice, only phrases up to a certain length are extracted (e.g. 7 words). Longer phrases would hardly ever be used by the translation model (unless it was presented with a sentence from the training data) and the phrase table would be extremely large.
Phrase Scoring
Once we have extracted all consistent phrase pairs from our training data, we can assign translation probabilities to them using maximum likelihood estimation. To estimate the probability of phrase being the translation of phrase , we simply count:
The formula tells us to simply count how many times we saw translated as in our training data and divide that by the number of times we saw in total.
In practice, other scores are also computed (e.g. ) but that's a topic for another lecture.
Decoding
When we get an input sentence for translation, the first step is to look up translation options (possible translations) for each source span in the phrase table. These can be thought of as jigsaw puzzle pieces which are combined to get as good final translation as possible. The search for this best combination (i.e. the most probable translation according to the model) is usually called decoding.