Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim, Jaehyun Park

TL;DR
This paper introduces M-Solomon, a multimodal embedder that adaptively decides when to augment queries, improving efficiency and effectiveness in multimodal information retrieval tasks.
Contribution
M-Solomon is the first universal multimodal embedder capable of adaptive query augmentation, reducing latency and enhancing retrieval performance.
Findings
M-Solomon outperforms baseline models without augmentation.
It surpasses models that always use augmentation in accuracy.
It achieves faster embedding latency while maintaining high performance.
Abstract
Query augmentation makes queries more meaningful by appending further information to the queries to find relevant documents. Current studies have proposed Large Language Model (LLM)-based embedders, which learn representation for embedding and generation for query augmentation in a multi-task manner by leveraging the generative capabilities of LLM. During inference, these jointly trained embedders have conducted query augmentation followed by embedding, showing effective results. However, augmenting every query leads to substantial embedding latency and query augmentation can be detrimental to performance for some queries. Also, previous methods have not been explored in multimodal environments. To tackle these problems, we propose M-Solomon, a universal multimodal embedder that can adaptively determine when to augment queries. Our approach first divides the queries of the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Information Retrieval and Search Behavior · Topic Modeling
