Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
Philippe Laban, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

TL;DR
This paper introduces Near-Negative Distinction (NND), an automatic evaluation method for NLG that leverages existing human annotations to provide a cost-effective, reproducible, and accurate alternative to traditional human evaluation.
Contribution
The paper proposes NND, a novel automatic evaluation approach that repurposes human annotations for more reliable and low-cost assessment of NLG models, outperforming standard metrics.
Findings
NND correlates better with human judgments than traditional metrics.
NND can be used for detailed model analysis and training dynamics.
NND effectively reuses existing human annotations for evaluation.
Abstract
Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation to establish a preference in a model's output over another is often necessary. However, human evaluation is usually costly, difficult to reproduce, and non-reusable. In this paper, we propose a new and simple automatic evaluation method for NLG called Near-Negative Distinction (NND) that repurposes prior human annotations into NND tests. In an NND test, an NLG model must place a higher likelihood on a high-quality output candidate than on a near-negative candidate with a known error. Model performance is established by the number of NND tests a model passes, as well as the distribution over task-specific errors the model fails on. Through experiments on three NLG tasks (question generation, question answering, and summarization), we show that NND achieves a higher correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
