Intro: Difference between revisions

From MT Talks
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
== Ambiguity in Language ==
== Ambiguity in language ==


Unusual grammatical constructions with unexpected meaning can be used to (deliberately) mislead a human reader. These are called [http://en.wikipedia.org/wiki/Garden_path_sentence garden path sentences]. Consider some of the best-known examples:
Unusual grammatical constructions with unexpected meaning can be used to (deliberately) mislead a human reader. These are called [http://en.wikipedia.org/wiki/Garden_path_sentence garden path sentences]. Consider some of the best-known examples:
Line 18: Line 18:
** river side?
** river side?


== Types of MT Systems ==
== Types of MT systems ==


[[File:pyramid.png|thumb|500px|'''Vauqouis triangle.''' Illustrates the possible approaches to linguistic abstraction in MT.]]
[[File:pyramid.png|thumb|500px|'''Vauqouis triangle.''' Illustrates the possible approaches to linguistic abstraction in MT.]]

Revision as of 13:23, 31 October 2014

Ambiguity in language

Unusual grammatical constructions with unexpected meaning can be used to (deliberately) mislead a human reader. These are called garden path sentences. Consider some of the best-known examples:

  • Fat people eat accumulates.
  • The horse raced past the barn fell.
  • The government plans to raise taxes were defeated.

But everyday sentences actually contain countless ambiguities which humans resolve so naturally that they do not even notice them. Knowledge of the world and context are essential.

The plant is next to the bank.

  • plant
    • factory?
    • flower?
  • bank
    • financial institution?
    • river side?

Types of MT systems

Vauqouis triangle. Illustrates the possible approaches to linguistic abstraction in MT.

Approaches to MT can be categorized by whether they work directly with surface words or whether they utilize some (linguistic) abstraction. Many successful MT systems disregard any linguistic information and treat all words as unrelated, indivisible units. Other systems perform linguistic analysis on the source side and then do transfer -- either to some abstract representation or directly to target-side surface words. In the first case, target-side generation is needed to create the surface words of the translation.

Another possible distinction is how the systems are "trained" -- in the past, linguistic experts would manually develop rules to describe the analysis, transfer or generation for a particular language pair. Such rule-based systems sometimes grew to very mature, complex systems. However, they can be very costly to build and difficult to adapt -- either to a new genre/domain or to different languages. The other end of this continuum is occupied by purely statistical systems which only require data and utilize statistical models or machine learning to capture the knowledge required for translation.