Data Acquisition

From MT Talks
Revision as of 16:07, 23 February 2015 by Tamchyna (talk | contribs)
Jump to navigation Jump to search

There seems to be a universal rule for (not only) statistical methods in NLP: More data is better data.

Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime[1].

[2]

[3]


token vs. type

References

  1. Philipp Koehn. Inaugural lecture.
  2. A. Halevy, P. Norvig, F. Pereira. The Unreasonable Effectiveness of Data
  3. Jan Hajič, Eva Hajičová. Some of Our Best Friends Are Statisticians