BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering
Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne

TL;DR
BERAG introduces a probabilistic, document-specific approach to retrieval-augmented generation, improving attribution, scalability, and performance in knowledge-based visual question answering tasks.
Contribution
It proposes BERAG and BEFT frameworks that condition on individual documents, enabling better attribution, re-ranking, and scalability over traditional concatenation methods.
Findings
BERAG outperforms standard RAG on visual question answering benchmarks.
The approach mitigates the 'lost-in-the-middle' effect in long contexts.
Document posterior enables detection of insufficient grounding and faster decoding.
Abstract
A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to the ``lost-in-the-middle'' effect, where relevant information in long contexts is overlooked. Concatenation also scales poorly: computational cost grows quadratically with context length, a problem that becomes especially severe when the context includes visual data, as in visual question answering. Attempts to mitigate these issues by limiting context length can further restrict performance by preventing models from benefiting from the improved recall offered by deeper retrieval. We propose Bayesian Ensemble Retrieval-Augmented Generation (BERAG), along with Bayesian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
