Data Acquisition: Difference between revisions

Revision as of 16:07, 23 February 2015

There seems to be a universal rule for (not only) statistical methods in NLP: More data is better data.

Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime^[1].

token vs. type

@@ Line 1: / Line 1: @@
 There seems to be a universal rule for (not only) statistical methods in NLP: '''More data is better data.'''
-Translation systems have at their disposal (order of magnitude) more data than a person reads in a lifetime<ref name="inaug">Phillip Koehn. [https://www.youtube.com/watch?v=6UVgFjJeFGY Inaugural lecture.]</ref>.
+Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime<ref name="inaug">Philipp Koehn. [https://www.youtube.com/watch?v=6UVgFjJeFGY Inaugural lecture.]</ref>.
+<ref name=effectiveness>A. Halevy, P. Norvig, F. Pereira. [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4804817&tag=1 ''The Unreasonable Effectiveness of Data'']</ref>
+<ref name=friends>Jan Hajič, Eva Hajičová. [http://link.springer.com/chapter/10.1007%2F978-3-540-74628-7_2#page-1 ''Some of Our Best Friends Are Statisticians'']</ref>
+token vs. type
 == References ==
 <references />