TL;DR
This paper introduces REFINE, a retrieval-enhanced feedback framework that systematically structures errors and provides targeted feedback to improve multimodal reasoning in large language models, achieving efficiency and scalability.
Contribution
REFINE is the first framework to systematically structure errors and feedback in multimodal LLMs, improving reasoning accuracy and efficiency without redundant retrievals.
Findings
Significant speedup in inference time
Reduced computational costs
Effective generalization across tasks
Abstract
Recent advancements in Large Language Models (LLMs) have significantly improved reasoning capabilities, with in-context learning (ICL) emerging as a key technique for adaptation without retraining. While previous works have focused on leveraging correct examples, recent research highlights the importance of learning from errors to enhance performance. However, existing methods lack a structured framework for analyzing and mitigating errors, particularly in Multimodal Large Language Models (MLLMs), where integrating visual and textual inputs adds complexity. To address this issue, we propose REFINE: Retrieval-Enhanced Feedback via In-context Neural Error-book, a teacher-student framework that systematically structures errors and provides targeted feedback. REFINE introduces three systematic queries to construct structured feedback -- Feed-Target, Feed-Check, and Feed-Path -- to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
