FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain
Suifeng Zhao, Zhuoran Jin, Sujian Li, Jun Gao

TL;DR
This paper introduces FinRAGBench-V, a comprehensive benchmark for multimodal Retrieval-Augmented Generation in finance, emphasizing visual content integration, traceability through visual citation, and evaluation of multimodal models.
Contribution
It presents a new multimodal finance benchmark with a bilingual corpus, a high-quality QA dataset, a baseline model RGenCite, and an automatic citation evaluation method.
Findings
FinRAGBench-V is challenging for current models.
Visual citation improves traceability in financial RAG.
Multimodal models still have significant room for improvement.
Abstract
Retrieval-Augmented Generation (RAG) plays a vital role in the financial domain, powering applications such as real-time market analysis, trend forecasting, and interest rate computation. However, most existing RAG research in finance focuses predominantly on textual data, overlooking the rich visual content in financial documents, resulting in the loss of key analytical insights. To bridge this gap, we present FinRAGBench-V, a comprehensive visual RAG benchmark tailored for finance which effectively integrates multimodal data and provides visual citation to ensure traceability. It includes a bilingual retrieval corpus with 60,780 Chinese and 51,219 English pages, along with a high-quality, human-annotated question-answering (QA) dataset spanning heterogeneous data types and seven question categories. Moreover, we introduce RGenCite, an RAG baseline that seamlessly integrates visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Attention Dropout · Byte Pair Encoding · Softmax · Linear Layer · Dropout · Dense Connections · Attention Is All You Need
