FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Yilun Zhao, Yitao Long, Yuru Jiang, Chengye Wang, Weiyuan Chen,, Hongjun Liu, Yiming Zhang, Xiangru Tang, Chen Zhao, Arman Cohan

TL;DR
FinDVer is a new benchmark for evaluating large language models' ability to verify claims and provide explanations in complex, long, and hybrid financial documents, highlighting current limitations and guiding future improvements.
Contribution
The paper introduces FinDVer, a comprehensive benchmark with expert-annotated data for assessing explainable claim verification in financial documents using LLMs.
Findings
GPT-4o performs below human experts in claim verification.
Long-context and RAG settings pose significant challenges for LLMs.
Analysis reveals common reasoning errors in current models.
Abstract
We introduce FinDVer, a comprehensive benchmark specifically designed to evaluate the explainable claim verification capabilities of LLMs in the context of understanding and analyzing long, hybrid-content financial documents. FinDVer contains 2,400 expert-annotated examples, divided into three subsets: information extraction, numerical reasoning, and knowledge-intensive reasoning, each addressing common scenarios encountered in real-world financial contexts. We assess a broad spectrum of LLMs under long-context and RAG settings. Our results show that even the current best-performing system, GPT-4o, still lags behind human experts. We further provide in-depth analysis on long-context and RAG setting, Chain-of-Thought reasoning, and model reasoning errors, offering insights to drive future advancements. We believe that FinDVer can serve as a valuable benchmark for evaluating LLMs in claim…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Digital Humanities and Scholarship · Digital Rights Management and Security
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Dense Connections · Layer Normalization · Adam · Attention Dropout · Linear Layer · Weight Decay · Linear Warmup With Linear Decay
