Constituency Trees: Difference between revisions

From MT Talks
Jump to navigation Jump to search
No edit summary
No edit summary
Line 10: Line 10:
== Context Free Grammar ==
== Context Free Grammar ==


Grammars, generally, are a way of describing a potentially infinite set of strings (sentences) using a finite set of production rules. In a [[https://en.wikipedia.org/wiki/Context-free_grammar context-free grammar]], production rules take the following form:
Grammars, generally, are a way of describing a potentially infinite set of strings (sentences) using a finite set of production rules. In a [https://en.wikipedia.org/wiki/Context-free_grammar context-free grammar], production rules take the following form:


V → w
V → w


V is a non-terminal symbol (for natural languages, non-terminals usually correspond to **phrases**, such as NP for noun phrases) and w is a (non-empty) string of terminals (words) and nonterminals. It is known that CFGs cannot fully describe natural languages but for MT, they can serve as a very useful simplification.
V is a non-terminal symbol (for natural languages, non-terminals usually correspond to '''phrases''', such as NP for noun phrases) and w is a (non-empty) string of terminals (words) and nonterminals. It is known that CFGs cannot fully describe natural languages but for MT, they can serve as a very useful simplification.
 
One nonterminal symbol serves as the top-level nonterminal where the generation starts (or where analysis ends) -- for natural languages, we usually use the symbol S (sentence).
 
  S -> NP VP
  NP -> dogs
  VP -> sleep
  NP -> Det Adj N
  Det -> the
  Adj -> black
  N -> cat


== Syntax in Machine Translation ==
== Syntax in Machine Translation ==

Revision as of 09:44, 6 August 2015

Lecture 10: Constituency Trees
Lecture video: web TODO
Youtube

{{#ev:youtube|https://www.youtube.com/watch?v=y_9SEdG1u3U%7C800%7Ccenter}}

Context Free Grammar

Grammars, generally, are a way of describing a potentially infinite set of strings (sentences) using a finite set of production rules. In a context-free grammar, production rules take the following form:

V → w

V is a non-terminal symbol (for natural languages, non-terminals usually correspond to phrases, such as NP for noun phrases) and w is a (non-empty) string of terminals (words) and nonterminals. It is known that CFGs cannot fully describe natural languages but for MT, they can serve as a very useful simplification.

One nonterminal symbol serves as the top-level nonterminal where the generation starts (or where analysis ends) -- for natural languages, we usually use the symbol S (sentence).

 S -> NP VP
 NP -> dogs
 VP -> sleep
 NP -> Det Adj N
 Det -> the
 Adj -> black
 N -> cat

Syntax in Machine Translation

Synchronous Grammars for Translation

Translating with SCFG

Synchronous Grammar Extraction