Show Your Work: Improved Reporting of Experimental Results
Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A., Smith

TL;DR
This paper highlights the importance of reporting validation performance during model development, introduces a method to estimate expected validation performance based on computation, and advocates for improved reporting practices in NLP research.
Contribution
It proposes a novel technique to estimate expected validation performance as a function of computation, enhancing the robustness of model comparisons.
Findings
Validation performance reporting varies widely across papers.
Estimated computation time to reach certain accuracy levels varies from hours to weeks.
Using the proposed method can change conclusions about model superiority.
Abstract
Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best. We argue for reporting additional details, especially performance on validation data obtained during model development. We present a novel technique for doing so: expected validation performance of the best-found model as a function of computation budget (i.e., the number of hyperparameter search trials or the overall training time). Using our approach, we find multiple recent model comparisons where authors would have reached a different conclusion if they had used more (or less) computation. Our approach also allows us to estimate the amount of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
