ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation

Ana Brassard; Benjamin Heinzerling; Keito Kudo; Keisuke Sakaguchi,; Kentaro Inui

arXiv:2405.04818·cs.CL·September 4, 2024

ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation

Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi,, Kentaro Inui

PDF

Open Access 1 Repo 1 Datasets

TL;DR

ACORN introduces a dataset with aspect-wise quality ratings for explanations and evaluates how large language models assess these explanations, highlighting their potential as supplementary tools alongside human raters.

Contribution

The paper presents ACORN, a new dataset for evaluating explanation quality, and analyzes the effectiveness of LLMs in rating explanations compared to human raters.

Findings

01

Larger LLMs maintain or increase inter-annotator agreement.

02

LLMs' correlation with human ratings varies across quality aspects.

03

Using LLMs as a supplement can improve agreement when human raters are scarce.

Abstract

Evaluating the quality of free-text explanations is a multifaceted, subjective, and labor-intensive task. Large language models (LLMs) present an appealing alternative due to their potential for consistency, scalability, and cost-efficiency. In this work, we present ACORN, a new dataset of 3,500 free-text explanations and aspect-wise quality ratings, and use it to evaluate how LLMs rate explanations. We observed that larger models outputted labels that maintained or increased the inter-annotator agreement, suggesting that they are within the expected variance between human raters. However, their correlation with majority-voted human ratings varied across different quality aspects, indicating that they are not a complete replacement. In turn, using LLMs as a supplement to a smaller group of human raters in some cases improved the correlation with the original majority labels. However,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

a-brassard/acorn
noneOfficial

Datasets

anab/ACORN
dataset· 75 dl
75 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Scientific Computing and Data Management

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Softmax · Absolute Position Encodings · Byte Pair Encoding