ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation
Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi,, Kentaro Inui

TL;DR
ACORN introduces a dataset with aspect-wise quality ratings for explanations and evaluates how large language models assess these explanations, highlighting their potential as supplementary tools alongside human raters.
Contribution
The paper presents ACORN, a new dataset for evaluating explanation quality, and analyzes the effectiveness of LLMs in rating explanations compared to human raters.
Findings
Larger LLMs maintain or increase inter-annotator agreement.
LLMs' correlation with human ratings varies across quality aspects.
Using LLMs as a supplement can improve agreement when human raters are scarce.
Abstract
Evaluating the quality of free-text explanations is a multifaceted, subjective, and labor-intensive task. Large language models (LLMs) present an appealing alternative due to their potential for consistency, scalability, and cost-efficiency. In this work, we present ACORN, a new dataset of 3,500 free-text explanations and aspect-wise quality ratings, and use it to evaluate how LLMs rate explanations. We observed that larger models outputted labels that maintained or increased the inter-annotator agreement, suggesting that they are within the expected variance between human raters. However, their correlation with majority-voted human ratings varied across different quality aspects, indicating that they are not a complete replacement. In turn, using LLMs as a supplement to a smaller group of human raters in some cases improved the correlation with the original majority labels. However,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Scientific Computing and Data Management
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Softmax · Absolute Position Encodings · Byte Pair Encoding
