Single Ground Truth Is Not Enough: Adding Flexibility to Aspect-Based Sentiment Analysis Evaluation
Soyoung Yang, Hojun Cho, Jiyoung Lee, Sohee Yoon, Edward Choi, Jaegul, Choo, Won Ik Cho

TL;DR
This paper introduces a flexible, automated evaluation framework for aspect-based sentiment analysis that accounts for multiple valid surface forms, improving assessment accuracy and revealing model capabilities hidden by traditional single-ground-truth methods.
Contribution
It proposes a novel pipeline to expand evaluation sets with alternative valid terms, enabling more equitable and comprehensive assessment of ABSA models.
Findings
Expanded evaluation sets improve human agreement by up to 10% in Kendall's Tau.
The approach uncovers LLM capabilities in ABSA that are hidden by single-answer ground truths.
The method is cost-effective, reproducible, and enhances the evaluation of surface form variability.
Abstract
Aspect-based sentiment analysis (ABSA) is a challenging task of extracting sentiments along with their corresponding aspects and opinion terms from the text. The inherent subjectivity of span annotation makes variability in the surface forms of extracted terms, complicating the evaluation process. Traditional evaluation methods often constrain ground truths (GT) to a single term, potentially misrepresenting the accuracy of semantically valid predictions that differ in surface form. To address this limitation, we propose a novel and fully automated pipeline that expands existing evaluation sets by adding alternative valid terms for aspect and opinion. Our approach facilitates an equitable assessment of language models by accommodating multiple-answer candidates, resulting in enhanced human agreement compared to single-answer test sets (achieving up to a 10\%p improvement in Kendall's Tau…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Inverse Square Root Schedule · Adafactor · Multi-Head Attention · Dense Connections · Residual Connection
