Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Cheng Tan, Jingxuan Wei, Linzhuang Sun, Zhangyang Gao, Siyuan Li,, Bihui Yu, Ruifeng Guo, Stan Z. Li

TL;DR
This paper introduces RMR, a multimodal retrieval and reasoning framework that enhances vision-language models' ability to reason with external knowledge, significantly improving performance on science and general knowledge benchmarks.
Contribution
The paper presents RMR, a novel retrieval-augmented reasoning framework for multimodal models, leveraging question-answer pairs to improve reasoning without additional training.
Findings
RMR significantly improves model performance on multiple benchmarks.
Using science curriculum data enhances reasoning capabilities.
The approach is training-free and highly effective.
Abstract
Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of multimodal RAG is to cultivate the models' ability to reason in response to relevant queries. To this end, we introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning). The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs, which then serve as scaffolds for the multimodal reasoning process. This training-free approach not only encourages the model to engage deeply with the reasoning processes inherent in the retrieved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Strategies and Epistemologies · Education and Critical Thinking Development · Innovative Teaching and Learning Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · Byte Pair Encoding · Adam · Residual Connection
