CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go, Taesun Whang, Chanhee Lee, Hwa-Yeon Kim, Sunghoon Park,, Seunghwan Ji, Jinho Kim, Dongchan Kim, Young-Bum Kim

TL;DR
CUE-M is a novel multimodal search framework that enhances understanding and retrieval accuracy by integrating image context, intent refinement, external APIs, and dynamic filtering, significantly improving performance on knowledge-based and safety benchmarks.
Contribution
This paper introduces CUE-M, a comprehensive multimodal search system that addresses current limitations through a multi-stage pipeline and adaptive filtering, setting new state-of-the-art results.
Findings
Outperforms existing baselines on knowledge-based VQA benchmarks.
Achieves new state-of-the-art results in multimodal retrieval accuracy.
Effectively filters inappropriate responses using dynamic, policy-driven classifiers.
Abstract
The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large Language Models (MLLMs) has revolutionized information retrieval and expanded the practical applications of AI. However, current systems struggle in accurately interpreting user intent, employing diverse retrieval strategies, and effectively filtering unintended or inappropriate responses, limiting their effectiveness. This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework that addresses these challenges through a multi-stage pipeline comprising image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering. CUE-M incorporates a robust filtering pipeline combining image-based, text-based, and multimodal classifiers, dynamically adapting to instance- and category-specific concern…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
