Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations
Xu Han, Felix Yu, Joao Sedoc, Benjamin Van Durme

TL;DR
This paper introduces IBWS, an iterative Best-Worst Scaling method for scalar annotations, and evaluates cost-effective alternatives for large-scale data collection, demonstrating their utility in dialogue and sentiment ranking tasks.
Contribution
The paper presents IBWS, a robust iterative BWS method, and assesses direct assessment approaches for scalable, accurate scalar annotation collection.
Findings
IBWS produces reliable rankings with fewer annotations.
Certain direct assessment methods closely correlate with BWS results.
The annotated data improves learning-to-rank models in dialogue and sentiment analysis.
Abstract
Our goal is a mechanism for efficiently assigning scalar ratings to each of a large set of elements. For example, "what percent positive or negative is this product review?" When sample sizes are small, prior work has advocated for methods such as Best Worst Scaling (BWS) as being more robust than direct ordinal annotation ("Likert scales"). Here we first introduce IBWS, which iteratively collects annotations through Best-Worst Scaling, resulting in robustly ranked crowd-sourced data. While effective, IBWS is too expensive for large-scale tasks. Using the results of IBWS as a best-desired outcome, we evaluate various direct assessment methods to determine what is both cost-efficient and best correlating to a large scale BWS annotation strategy. Finally, we illustrate in the domains of dialogue and sentiment how these annotations can support robust learning-to-rank models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training
