Word Alignment - Revision history

https://mttalks.ufal.ms.mff.cuni.cz/index.php?action=history&feed=atom&title=Word_Alignment Word Alignment - Revision history 2026-07-21T11:26:39Z Revision history for this page on the wiki MediaWiki 1.41.0 https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=402&oldid=prev Tamchyna at 10:07, 25 March 2015 2015-03-25T10:07:15Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 10:07, 25 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l51">Line 51:</td> <td colspan="2" class="diff-lineno">Line 51:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== See Also ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== See Also ==</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Adam Lopez' [http://lectures.ms.mff.cuni.cz/video/recordshow/index/44/172 lecture] on IBM model 1 and the accompanying [http://ufal.mff.cuni.cz/mtm13/files/05-word-alignment-adam-lopez.pdf slides]</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* </ins>Adam Lopez' [http://lectures.ms.mff.cuni.cz/video/recordshow/index/44/172 lecture] on IBM model 1 and the accompanying [http://ufal.mff.cuni.cz/mtm13/files/05-word-alignment-adam-lopez.pdf slides]</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">* Philipp Koehn's [http://www.statmt.org/book/slides/04-word-based-models.pdf#page=30 slides] (the slides contain the pseudo-code for IBM1)</ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><references /></div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><references /></div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=401&oldid=prev Tamchyna at 10:01, 25 March 2015 2015-03-25T10:01:35Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 10:01, 25 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l31">Line 31:</td> <td colspan="2" class="diff-lineno">Line 31:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Initially, all of our translation probabilities are uniform.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Initially, all of our translation probabilities are uniform.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T</math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T</math> <ins style="font-weight: bold; text-decoration: none;">(because our initial probability is uniform) </ins>from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td></tr> <tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l48">Line 48:</td> <td colspan="2" class="diff-lineno">Line 48:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://codex3.ms.mff.cuni.cz/codex-trans/?groupId=3&taskId=11&module=groups%2Ftasks&page=specification Implement IBM-1 word alignment] </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* [https://codex3.ms.mff.cuni.cz/codex-trans/?groupId=3&taskId=11&module=groups%2Ftasks&page=specification Implement IBM-1 word alignment] </div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">== See Also ==</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">Adam Lopez' [http://lectures.ms.mff.cuni.cz/video/recordshow/index/44/172 lecture] on IBM model 1 and the accompanying [http://ufal.mff.cuni.cz/mtm13/files/05-word-alignment-adam-lopez.pdf slides]</ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== References ==</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><references /></div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div><references /></div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=400&oldid=prev Tamchyna at 16:46, 24 March 2015 2015-03-24T16:46:28Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:46, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l35">Line 35:</td> <td colspan="2" class="diff-lineno">Line 35:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><math>P(t_{j_1}|s_i) = \frac{\text{<del style="font-weight: bold; text-decoration: none;">frac\_count</del>}(t_{j_1}, s_i)}{\sum_{t'} \text{<del style="font-weight: bold; text-decoration: none;">frac\_count</del>}(t', s_i)}</math></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><math>P(t_{j_1}|s_i) = \frac{\text{<ins style="font-weight: bold; text-decoration: none;">count</ins>}(t_{j_1}, s_i)}{\sum_{t'} \text{<ins style="font-weight: bold; text-decoration: none;">count</ins>}(t', s_i)}</math></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=399&oldid=prev Tamchyna at 16:45, 24 March 2015 2015-03-24T16:45:50Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:45, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l35">Line 35:</td> <td colspan="2" class="diff-lineno">Line 35:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><math>P(t_{j_1}|s_i) = \frac{\text{<del style="font-weight: bold; text-decoration: none;">frac_count</del>}(t_{j_1}, s_i)}{\sum_{t'} \text{<del style="font-weight: bold; text-decoration: none;">frac_count</del>}(t', s_i)}</math></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><math>P(t_{j_1}|s_i) = \frac{\text{<ins style="font-weight: bold; text-decoration: none;">frac\_count</ins>}(t_{j_1}, s_i)}{\sum_{t'} \text{<ins style="font-weight: bold; text-decoration: none;">frac\_count</ins>}(t', s_i)}</math></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=398&oldid=prev Tamchyna at 16:45, 24 March 2015 2015-03-24T16:45:22Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:45, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l35">Line 35:</td> <td colspan="2" class="diff-lineno">Line 35:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone through our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model):</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><math>P(t_{j_1}|s_i) = \frac{\text{frac_count}(t_{j_1}, s_i)}{\sum_{t'} \text{frac_count}(t', s_i)}</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><math>P(t_{j_1}|s_i) = \frac{\text{frac_count}(t_{j_1}, s_i)}{\sum_{t'} \text{frac_count}(t', s_i)}<ins style="font-weight: bold; text-decoration: none;"></math></ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=397&oldid=prev Tamchyna at 16:45, 24 March 2015 2015-03-24T16:45:09Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:45, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l33">Line 33:</td> <td colspan="2" class="diff-lineno">Line 33:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T</math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T</math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model)<del style="font-weight: bold; text-decoration: none;">. The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</del></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone <ins style="font-weight: bold; text-decoration: none;">through </ins>our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_{j_1},\ldots,t_{j_n}</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model)<ins style="font-weight: bold; text-decoration: none;">:</ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution globally optimal for our data.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"><math>P(t_{j_1}|s_i) = \frac{\text{frac_count}(t_{j_1}, s_i)}{\sum_{t'} \text{frac_count}(t', s_i)}</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution <ins style="font-weight: bold; text-decoration: none;">which is </ins>globally optimal for our data.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Model Limitations ===</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Model Limitations ===</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=396&oldid=prev Tamchyna at 16:39, 24 March 2015 2015-03-24T16:39:24Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:39, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l33">Line 33:</td> <td colspan="2" class="diff-lineno">Line 33:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T</math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T</math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math><del style="font-weight: bold; text-decoration: none;">t_j_1</del>,\ldots,<del style="font-weight: bold; text-decoration: none;">t_j_n</del></math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math><ins style="font-weight: bold; text-decoration: none;">t_{j_1}</ins>,\ldots,<ins style="font-weight: bold; text-decoration: none;">t_{j_n}</ins></math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution globally optimal for our data.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution globally optimal for our data.</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=395&oldid=prev Tamchyna at 16:38, 24 March 2015 2015-03-24T16:38:52Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:38, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l31">Line 31:</td> <td colspan="2" class="diff-lineno">Line 31:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Initially, all of our translation probabilities are uniform.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Initially, all of our translation probabilities are uniform.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T<math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T<<ins style="font-weight: bold; text-decoration: none;">/</ins>math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_j_1,\ldots,t_j_n</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i</math> with some target words <math>t_j_1,\ldots,t_j_n</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=394&oldid=prev Tamchyna at 16:38, 24 March 2015 2015-03-24T16:38:18Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:38, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l33">Line 33:</td> <td colspan="2" class="diff-lineno">Line 33:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T<math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T<math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i<math> with some target words <math>t_j_1,\ldots,t_j_n</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i<<ins style="font-weight: bold; text-decoration: none;">/</ins>math> with some target words <math>t_j_1,\ldots,t_j_n</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution globally optimal for our data.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution globally optimal for our data.</div></td></tr> </table>

Tamchyna https://mttalks.ufal.ms.mff.cuni.cz/index.php?title=Word_Alignment&diff=393&oldid=prev Tamchyna at 16:37, 24 March 2015 2015-03-24T16:37:39Z

<p></p> <table style="background-color: #fff; color: #202122;" data-mw="interface"> <col class="diff-marker" /> <col class="diff-content" /> <col class="diff-marker" /> <col class="diff-content" /> <tr class="diff-title" lang="en"> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td> <td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 16:37, 24 March 2015</td> </tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l30">Line 30:</td> <td colspan="2" class="diff-lineno">Line 30:</td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Initially, all of our translation probabilities are uniform.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Initially, all of our translation probabilities are uniform.</div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">In the '''expectation step''', we apply them to our data, i.e. we draw alignment links. Assuming our current sentence contains <math>T</math> target words, we draw a link of "weight" <math>1/T<math> from each source word to each target word. And for each source word, we remember that we have seen it aligned to each of the target words in the sentence <math>1/T</math> times. That is, we collect ''fractional counts''.</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">Now assume that we have gone our whole corpus. We have seen a given source word <math>s_i<math> with some target words <math>t_j_1,\ldots,t_j_n</math> some (fractional) number of times. Now in the '''maximization step''', we turn these fractional counts into probabilities simply by normalizing them (this is the optimal thing to do, we could formally derive that this is the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimate] for this model). The words that occurred frequently with our source word <math>s_i</math> will have a higher fractional count, and therefore a higher translation probability.</ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr> <tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">In the next iteration, we apply our newly learned probabilities to our data, collect the fractional counts etc. We can run this algorithm for a limited number of iterations or check for convergence (the conditional probabilities stop changing). In this simple model, we are guaranteed to find a solution globally optimal for our data.</ins></div></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br></td></tr> <tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Model Limitations ===</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>=== Model Limitations ===</div></td></tr> </table>

Tamchyna