Data Acquisition: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
There seems to be a universal rule for (not only) statistical methods in NLP: '''More data is better data.''' | There seems to be a universal rule for (not only) statistical methods in NLP: '''More data is better data.''' | ||
Translation systems have at their disposal ( | Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime<ref name="inaug">Philipp Koehn. [https://www.youtube.com/watch?v=6UVgFjJeFGY Inaugural lecture.]</ref>. | ||
<ref name=effectiveness>A. Halevy, P. Norvig, F. Pereira. [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4804817&tag=1 ''The Unreasonable Effectiveness of Data'']</ref> | |||
<ref name=friends>Jan Hajič, Eva Hajičová. [http://link.springer.com/chapter/10.1007%2F978-3-540-74628-7_2#page-1 ''Some of Our Best Friends Are Statisticians'']</ref> | |||
token vs. type | |||
== References == | == References == | ||
<references /> | <references /> |
Revision as of 16:07, 23 February 2015
There seems to be a universal rule for (not only) statistical methods in NLP: More data is better data.
Translation systems have at their disposal (orders of magnitude) more training data than a person reads in a lifetime[1].
token vs. type
References
- ↑ Philipp Koehn. Inaugural lecture.
- ↑ A. Halevy, P. Norvig, F. Pereira. The Unreasonable Effectiveness of Data
- ↑ Jan Hajič, Eva Hajičová. Some of Our Best Friends Are Statisticians