Data Acquisition: Difference between revisions

From MT Talks
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
There seems to be a universal rule for (not only) statistical methods in NLP: '''More data is better data.'''
There seems to be a universal rule for (not only) statistical methods in NLP: '''More data is better data.'''


Translation systems have at their disposal (order of magnitude) more data than a person reads in a lifetime<ref name="inaug">Phillip Koehn. [https://www.youtube.com/watch?v=6UVgFjJeFGY Inaugural lecture.]</ref>.
Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime<ref name="inaug">Philipp Koehn. [https://www.youtube.com/watch?v=6UVgFjJeFGY Inaugural lecture.]</ref>.
 
<ref name=effectiveness>A. Halevy, P. Norvig, F. Pereira. [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4804817&tag=1 ''The Unreasonable Effectiveness of Data'']</ref>
 
<ref name=friends>Jan Hajič, Eva Hajičová. [http://link.springer.com/chapter/10.1007%2F978-3-540-74628-7_2#page-1 ''Some of Our Best Friends Are Statisticians'']</ref>
 
 
token vs. type


== References ==
== References ==


<references />
<references />

Revision as of 16:07, 23 February 2015

There seems to be a universal rule for (not only) statistical methods in NLP: More data is better data.

Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime[1].

[2]

[3]


token vs. type

References

  1. Philipp Koehn. Inaugural lecture.
  2. A. Halevy, P. Norvig, F. Pereira. The Unreasonable Effectiveness of Data
  3. Jan Hajič, Eva Hajičová. Some of Our Best Friends Are Statisticians