EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI
Longfei Zuo, Barbara Plank, Siyao Peng

TL;DR
This paper introduces EVADE, a framework using large language models to generate and validate explanations for error detection in NLI datasets, aiming to scale error detection and improve dataset quality efficiently.
Contribution
EVADE leverages LLMs for explanation generation and validation in NLI, reducing reliance on costly manual annotation and enhancing error detection accuracy.
Findings
LLM validation aligns explanation distributions with human annotations
Removing LLM-detected errors improves fine-tuning performance
EVADE reduces human effort in dataset error detection
Abstract
High-quality datasets are critical for training and evaluating reliable NLP models. In tasks like natural language inference (NLI), human label variation (HLV) arises when multiple labels are valid for the same instance, making it difficult to separate annotation errors from plausible variation. An earlier framework VARIERR (Weber-Genzel et al., 2024) asks multiple annotators to explain their label decisions in the first round and flag errors via validity judgments in the second round. However, conducting two rounds of manual annotation is costly and may limit the coverage of plausible labels or explanations. Our study proposes a new framework, EVADE, for generating and validating explanations to detect errors using large language models (LLMs). We perform a comprehensive analysis comparing human- and LLM-detected errors for NLI across distribution comparison, validation overlap, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
