Capturing Label Distribution: A Case Study in NLI
Shujian Zhang, Chengyue Gong, Eunsol Choi

TL;DR
This paper investigates methods to better estimate human disagreement in natural language inference, showing that post-hoc smoothing and collecting multiple references improve label distribution modeling.
Contribution
It introduces a novel approach of collecting multiple references during training and compares it with post-hoc smoothing for estimating label distributions in NLI.
Findings
Post-hoc smoothing reduces KL divergence by nearly half.
Collecting multiple references improves accuracy within fixed annotation budgets.
Simple smoothing does not enhance majority label prediction accuracy.
Abstract
We study estimating inherent human disagreement (annotation label distribution) in natural language inference task. Post-hoc smoothing of the predicted label distribution to match the expected label entropy is very effective. Such simple manipulation can reduce KL divergence by almost half, yet will not improve majority label prediction accuracy or learn label distributions. To this end, we introduce a small amount of examples with multiple references into training. We depart from the standard practice of collecting a single reference per each training example, and find that collecting multiple references can achieve better accuracy under the fixed annotation budget. Lastly, we provide rich analyses comparing these two methods for improving label distribution estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
