First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez

TL;DR
This paper examines the history of large language models to identify lessons and ongoing challenges, emphasizing that data quality, evaluation, and innovative approaches remain crucial despite recent successes like ChatGPT.
Contribution
It offers a historical perspective on LLM development, highlighting enduring problems and guiding future research directions in the era of large models.
Findings
Disparities in scale are transient and can be addressed by researchers.
Data quality remains a bottleneck for many NLP applications.
Realistic evaluation of LLMs is still an open challenge.
Abstract
Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs). After such a disruptive change to our understanding of the field, what is left to do? Taking a historical lens, we look for guidance from the first era of LLMs, which began in 2005 with large -gram models for machine translation (MT). We identify durable lessons from the first era, and more importantly, we identify evergreen problems where NLP researchers can continue to make meaningful contributions in areas where LLMs are ascendant. We argue that disparities in scale are transient and researchers can work to reduce them; that data, rather than hardware, is still a bottleneck for many applications; that meaningful realistic evaluation is still an open problem; and that there is still room for speculative approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
