Constituency Trees

From MT Talks
Revision as of 09:44, 6 August 2015 by Tamchyna (talk | contribs)
Jump to navigation Jump to search
Lecture 10: Constituency Trees
Lecture video: web TODO
Youtube

{{#ev:youtube|https://www.youtube.com/watch?v=y_9SEdG1u3U%7C800%7Ccenter}}

Context Free Grammar

Grammars, generally, are a way of describing a potentially infinite set of strings (sentences) using a finite set of production rules. In a context-free grammar, production rules take the following form:

V → w

V is a non-terminal symbol (for natural languages, non-terminals usually correspond to phrases, such as NP for noun phrases) and w is a (non-empty) string of terminals (words) and nonterminals. It is known that CFGs cannot fully describe natural languages but for MT, they can serve as a very useful simplification.

One nonterminal symbol serves as the top-level nonterminal where the generation starts (or where analysis ends) -- for natural languages, we usually use the symbol S (sentence).

 S -> NP VP
 NP -> dogs
 VP -> sleep
 NP -> Det Adj N
 Det -> the
 Adj -> black
 N -> cat

Syntax in Machine Translation

Synchronous Grammars for Translation

Translating with SCFG

Synchronous Grammar Extraction