Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

Wongyu Kim; Hochang Lee; Sanghak Lee; Yoonsung Kim; Jaehyun Park

arXiv:2511.02358·cs.CL·November 5, 2025

Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation

Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park

PDF

Open Access

TL;DR

This paper introduces M-Solomon, a multimodal embedder that adaptively decides when to augment queries, improving efficiency and effectiveness in multimodal information retrieval tasks.

Contribution

M-Solomon is the first universal multimodal embedder capable of adaptive query augmentation, reducing latency and enhancing retrieval performance.

Findings

01

M-Solomon outperforms baseline models without augmentation.

02

It surpasses models that always use augmentation in accuracy.

03

It achieves faster embedding latency while maintaining high performance.

Abstract

Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders have conducted query augmentation followed by embedding, showing effective results. However, augmenting every query leads to substantial embedding latency and query augmentation can be detrimental to performance for some queries. Also, previous methods have not been explored in multimodal environments. To tackle these problems, we propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries. Our approach first divides the queries of the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Information Retrieval and Search Behavior · Topic Modeling