TL;DR
MAB-DQA introduces a multi-armed bandit approach to improve document question answering by dynamically prioritizing query aspects, leading to significant performance gains on multiple benchmarks.
Contribution
It proposes a novel aspect-aware retrieval framework using multi-armed bandits to better utilize multiple implicit query aspects in multimodal DQA.
Findings
Achieves 5%-18% improvement over state-of-the-art methods.
Effectively models query aspect importance for better retrieval.
Enhances document understanding in multimodal DQA.
Abstract
Document Question Answering (DQA) involves generating answers from a document based on a user's query, representing a key task in document understanding. This task requires interpreting visual layouts, which has prompted recent studies to adopt multimodal Retrieval-Augmented Generation (RAG) that processes page images for answer generation. However, in multimodal RAG, visual DQA struggles to utilize a large number of images effectively, as the retrieval stage often retains only a few candidate pages (e.g., Top-4), causing informative but less visually salient content to be overlooked in favor of common yet low-information pages. To address this issue, we propose a Multi-Armed Bandit-based DQA framework (MAB-DQA) to explicitly model the varying importance of multiple implicit aspects in a query. Specifically, MAB-DQA decomposes a query into aspect-aware subqueries and retrieves an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
