Show Your Work: Improved Reporting of Experimental Results

Jesse Dodge; Suchin Gururangan; Dallas Card; Roy Schwartz; Noah A.; Smith

arXiv:1909.03004·cs.LG·September 9, 2019·19 cites

Show Your Work: Improved Reporting of Experimental Results

Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A., Smith

PDF

Open Access 4 Repos

TL;DR

This paper highlights the importance of reporting validation performance during model development, introduces a method to estimate expected validation performance based on computation, and advocates for improved reporting practices in NLP research.

Contribution

It proposes a novel technique to estimate expected validation performance as a function of computation, enhancing the robustness of model comparisons.

Findings

01

Validation performance reporting varies widely across papers.

02

Estimated computation time to reach certain accuracy levels varies from hours to weeks.

03

Using the proposed method can change conclusions about model superiority.

Abstract

Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best. We argue for reporting additional details, especially performance on validation data obtained during model development. We present a novel technique for doing so: expected validation performance of the best-found model as a function of computation budget (i.e., the number of hyperparameter search trials or the overall training time). Using our approach, we find multiple recent model comparisons where authors would have reached a different conclusion if they had used more (or less) computation. Our approach also allows us to estimate the amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification