Rich Vocabulary: Difference between revisions
No edit summary |
No edit summary |
||
Line 13: | Line 13: | ||
While German has some degree of inflection, it is the Germans' fondness of complex word compounds that causes the large vocabulary problem for MT. Consider the following compound: | While German has some degree of inflection, it is the Germans' fondness of complex word compounds that causes the large vocabulary problem for MT. Consider the following compound: | ||
[[File:rindfleish-prezi.png| | [[File:rindfleish-prezi.png|600px]] | ||
=== Finnish -- agglutination === | === Finnish -- agglutination === | ||
Agglutinative languages (such as Finnish, Turkish or Hungarian) often attach many affixes (prefixes or suffixes) to words. These affixes can describe grammatical properties or change the word meaning, as shown in the example: | |||
[[File:finnish-prezi.png|500px]] | [[File:finnish-prezi.png|500px]] | ||
For Finnish, nouns are said to have over 2000 possible inflections. The number of unique word forms in Finnish can therefore be astronomical. | |||
=== Czech -- fusional inflection === | === Czech -- fusional inflection === | ||
Fusional languages differ from agglutinative languages in that they ''fuse'' multiple properties into a single affix. In Czech, one suffix can describe case, gender and number at the same time. On the other hand, fusional affixes tend to be ambiguous (e.g. an identical suffix can be used for multiple morphological cases). | |||
Morphologically rich languages tend to impose strong agreement constraints on the suffixes (adjetive inflection must agree with its governing noun, subject and objects must agree with the verb inflection). Consider the following example: | |||
[[File:czech-inflection-prezi.png|500px]] | [[File:czech-inflection-prezi.png|500px]] |
Revision as of 13:34, 12 August 2015
Lecture video: |
web TODO Youtube |
---|
{{#ev:youtube|https://www.youtube.com/watch?v=eSIbNT-yjdg%7C800%7Ccenter}}
Examples of Languages with a Rich Vocabulary
German -- compounding
While German has some degree of inflection, it is the Germans' fondness of complex word compounds that causes the large vocabulary problem for MT. Consider the following compound:
Finnish -- agglutination
Agglutinative languages (such as Finnish, Turkish or Hungarian) often attach many affixes (prefixes or suffixes) to words. These affixes can describe grammatical properties or change the word meaning, as shown in the example:
For Finnish, nouns are said to have over 2000 possible inflections. The number of unique word forms in Finnish can therefore be astronomical.
Czech -- fusional inflection
Fusional languages differ from agglutinative languages in that they fuse multiple properties into a single affix. In Czech, one suffix can describe case, gender and number at the same time. On the other hand, fusional affixes tend to be ambiguous (e.g. an identical suffix can be used for multiple morphological cases).
Morphologically rich languages tend to impose strong agreement constraints on the suffixes (adjetive inflection must agree with its governing noun, subject and objects must agree with the verb inflection). Consider the following example: