EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A
Shijian Ma (1), Yan Lin (2), Yi Yang (1) ((1) The Hong Kong University of Science, Technology, Hong Kong SAR, China, (2) University of Macau, Macau SAR, China)

TL;DR
EvasionBench is a large-scale benchmark dataset and evaluation framework designed to detect evasive responses in earnings call Q&A sessions, utilizing a novel annotation pipeline and a fine-tuned classifier to improve detection accuracy.
Contribution
The paper introduces EvasionBench, the first large-scale benchmark for detecting managerial evasion, with a comprehensive dataset, a multi-model consensus annotation pipeline, and a high-performing classifier.
Findings
The MMC annotation pipeline achieves Cohen's Kappa of 0.835.
The Eva-4B classifier reaches 84.9% Macro-F1.
Multi-model consensus labeling outperforms single-model annotation.
Abstract
We present EvasionBench, a comprehensive benchmark for detecting evasive responses in corporate earnings call question-and-answer sessions. Drawing from 22.7 million Q&A pairs extracted from S&P Capital IQ transcripts, we construct a rigorously filtered dataset and introduce a three-level evasion taxonomy: direct, intermediate, and fully evasive. Our annotation pipeline employs a Multi-Model Consensus (MMC) framework, combining dual frontier LLM annotation with a three-judge majority voting mechanism for ambiguous cases, achieving a Cohen's Kappa of 0.835 on human inter-annotator agreement. We release: (1) a balanced 84K training set, (2) a 1K gold-standard evaluation set with expert human labels, and (3) [Eva-4B], a 4-billion parameter classifier fine-tuned from Qwen3-4B that achieves 84.9% Macro-F1, outperforming Claude 4.5, GPT-5.2, and Gemini 3 Flash. Our ablation studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuditing, Earnings Management, Governance · Explainable Artificial Intelligence (XAI) · Expert finding and Q&A systems
