CUE-M: Contextual Understanding and Enhanced Search with Multimodal   Large Language Model

Dongyoung Go; Taesun Whang; Chanhee Lee; Hwa-Yeon Kim; Sunghoon Park,; Seunghwan Ji; Jinho Kim; Dongchan Kim; Young-Bum Kim

arXiv:2411.12287·cs.CL·March 24, 2025

CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model

Dongyoung Go, Taesun Whang, Chanhee Lee, Hwa-Yeon Kim, Sunghoon Park,, Seunghwan Ji, Jinho Kim, Dongchan Kim, Young-Bum Kim

PDF

Open Access

TL;DR

CUE-M is a novel multimodal search framework that enhances understanding and retrieval accuracy by integrating image context, intent refinement, external APIs, and dynamic filtering, significantly improving performance on knowledge-based and safety benchmarks.

Contribution

This paper introduces CUE-M, a comprehensive multimodal search system that addresses current limitations through a multi-stage pipeline and adaptive filtering, setting new state-of-the-art results.

Findings

01

Outperforms existing baselines on knowledge-based VQA benchmarks.

02

Achieves new state-of-the-art results in multimodal retrieval accuracy.

03

Effectively filters inappropriate responses using dynamic, policy-driven classifiers.

Abstract

The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large Language Models (MLLMs) has revolutionized information retrieval and expanded the practical applications of AI. However, current systems struggle in accurately interpreting user intent, employing diverse retrieval strategies, and effectively filtering unintended or inappropriate responses, limiting their effectiveness. This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework that addresses these challenges through a multi-stage pipeline comprising image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering. CUE-M incorporates a robust filtering pipeline combining image-based, text-based, and multimodal classifiers, dynamically adapting to instance- and category-specific concern…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling