Rethinking and Refining the Distinct Metric
Siyang Liu, Sahand Sabour, Yinhe Zheng, Pei Ke, Xiaoyan Zhu, Minlie, Huang

TL;DR
This paper introduces the Expectation-Adjusted Distinct (EAD) metric, which refines the calculation of diversity scores in language generation by removing biases related to sequence length, leading to better correlation with human judgments.
Contribution
The paper proposes a novel bias-corrected distinct score, EAD, with empirical and theoretical validation, improving diversity evaluation in language models.
Findings
EAD correlates better with human judgments
Original distinct scores are biased against longer sequences
EAD effectively removes length-related biases
Abstract
Distinct- score\cite{Li2016} is a widely used automatic metric for evaluating diversity in language generation tasks. However, we observed that the original approach for calculating distinct scores has evident biases that tend to assign higher penalties to longer sequences. We refine the calculation of distinct scores by scaling the number of distinct tokens based on their expectations. We provide both empirical and theoretical evidence to show that our method effectively removes the biases existing in the original distinct score. Our experiments show that our proposed metric, \textit{Expectation-Adjusted Distinct (EAD)}, correlates better with human judgment in evaluating response diversity. To foster future research, we provide an example implementation at \url{https://github.com/lsy641/Expectation-Adjusted-Distinct}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Speech and dialogue systems
