Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering

Jongha Kim; Byungoh Ko; Jeehye Na; Jinsung Yoon; Hyunwoo J. Kim

arXiv:2602.06050·cs.CL·February 9, 2026

Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering

Jongha Kim, Byungoh Ko, Jeehye Na, Jinsung Yoon, Hyunwoo J. Kim

PDF

Open Access

TL;DR

This paper introduces RMCD, a relevance-aware decoding method for retrieval-augmented visual question answering, which effectively leverages multiple contexts to improve accuracy without additional training.

Contribution

The paper proposes a novel decoding approach that weights multiple contexts by relevance, enhancing retrieval-augmented VQA performance without extra training.

Findings

01

RMCD outperforms existing decoding methods on three benchmarks.

02

RMCD is robust to retrieval quality variations.

03

RMCD can be applied without additional training.

Abstract

Despite the remarkable capabilities of Large Vision Language Models (LVLMs), they still lack detailed knowledge about specific entities. Retrieval-augmented Generation (RAG) is a widely adopted solution that enhances LVLMs by providing additional contexts from an external Knowledge Base. However, we observe that previous decoding methods for RAG are sub-optimal as they fail to sufficiently leverage multiple relevant contexts and suppress the negative effects of irrelevant contexts. To this end, we propose Relevance-aware Multi-context Contrastive Decoding (RMCD), a novel decoding method for RAG. RMCD outputs a final prediction by combining outputs predicted with each context, where each output is weighted based on its relevance to the question. By doing so, RMCD effectively aggregates useful information from multiple relevant contexts while also counteracting the negative effects of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling