Constituency Trees: Difference between revisions
No edit summary |
No edit summary |
||
Line 17: | Line 17: | ||
One nonterminal symbol serves as the top-level nonterminal where the generation starts (or where analysis ends) -- for natural languages, we usually use the symbol S (sentence). | One nonterminal symbol serves as the top-level nonterminal where the generation starts (or where analysis ends) -- for natural languages, we usually use the symbol S (sentence). | ||
The following example grammar can generate (or analyze) sentences such as ''Dogs sleep'' (grammatical English) or ''The black cat sleep'' (ungrammatical). | |||
S -> NP VP | S -> NP VP | ||
Line 25: | Line 27: | ||
Adj -> black | Adj -> black | ||
N -> cat | N -> cat | ||
We can mend this grammar by dividing each nonterminal into two variants -- one for singular and one for plural. In general, linguistic phrases have many independent linguistic properties which would require us to split the nonterminals into an exponential number of variants. | |||
== Syntax in Machine Translation == | == Syntax in Machine Translation == |
Revision as of 09:48, 6 August 2015
Lecture video: |
web TODO Youtube |
---|
{{#ev:youtube|https://www.youtube.com/watch?v=y_9SEdG1u3U%7C800%7Ccenter}}
Context Free Grammar
Grammars, generally, are a way of describing a potentially infinite set of strings (sentences) using a finite set of production rules. In a context-free grammar, production rules take the following form:
V → w
V is a non-terminal symbol (for natural languages, non-terminals usually correspond to phrases, such as NP for noun phrases) and w is a (non-empty) string of terminals (words) and nonterminals. It is known that CFGs cannot fully describe natural languages but for MT, they can serve as a very useful simplification.
One nonterminal symbol serves as the top-level nonterminal where the generation starts (or where analysis ends) -- for natural languages, we usually use the symbol S (sentence).
The following example grammar can generate (or analyze) sentences such as Dogs sleep (grammatical English) or The black cat sleep (ungrammatical).
S -> NP VP NP -> dogs VP -> sleep NP -> Det Adj N Det -> the Adj -> black N -> cat
We can mend this grammar by dividing each nonterminal into two variants -- one for singular and one for plural. In general, linguistic phrases have many independent linguistic properties which would require us to split the nonterminals into an exponential number of variants.