Awes, Laws, and Flaws From Today's LLM Research
Adrian de Wynter

TL;DR
This paper critically examines the methodology of recent large language model research, highlighting trends, issues, and the effectiveness of checklists, and offers recommendations for improving research rigor and ethics.
Contribution
It provides a comprehensive analysis of over 2,000 LLM studies, identifying methodological trends and proposing improvements for research practices.
Findings
Decline in ethics disclaimers over time
Rise in LLMs used as evaluators
Increase in claims of reasoning abilities without human validation
Abstract
We perform a critical examination of the scientific methodology behind contemporary large language model (LLM) research. For this we assess over 2,000 research works released between 2020 and 2024 based on criteria typical of what is considered good research (e.g. presence of statistical tests and reproducibility), and cross-validate it with arguments that are at the centre of controversy (e.g., claims of emergent behaviour). We find multiple trends, such as declines in ethics disclaimers, a rise of LLMs as evaluators, and an increase on claims of LLM reasoning abilities without leveraging human evaluation. We note that conference checklists are effective at curtailing some of these issues, but balancing velocity and rigour in research cannot solely rely on these. We tie all these findings to findings from recent meta-reviews and extend recommendations on how to address what does, does…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Education and Practice Innovations · Artificial Intelligence in Law · Law, AI, and Intellectual Property
