SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs

Haoran Lou; Ziyan Liu; Chunxiao Fan; Yuexin Wu; Yue Ming; Hao Wu; Kai Zuo; Yibo Chen; Xu Tang

arXiv:2604.13710·cs.CV·May 12, 2026

SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs

Haoran Lou, Ziyan Liu, Chunxiao Fan, Yuexin Wu, Yue Ming, Hao Wu, Kai Zuo, Yibo Chen, Xu Tang

PDF

1 Repo

TL;DR

SLQ is a parameter-efficient framework that adapts multimodal large language models for retrieval tasks by using shared latent queries, preserving the pre-trained model's knowledge and outperforming invasive fine-tuning methods.

Contribution

Introduces SLQ, a novel non-invasive tuning method with shared latent queries for multimodal retrieval, and constructs KARR-Bench for knowledge-aware reasoning evaluation.

Findings

01

SLQ outperforms full fine-tuning and LoRA on COCO and Flickr30K datasets.

02

SLQ achieves competitive performance on MMEB.

03

SLQ yields substantial gains on the KARR-Bench benchmark.

Abstract

Multimodal Large Language Models (MLLMs) possess intrinsic reasoning and world-knowledge capabilities, yet adapting them for dense retrieval remains challenging. Existing approaches rely on invasive parameter updates, such as full fine-tuning and LoRA, which may disrupt the pre-trained semantic space and impair the structured knowledge essential for reasoning. To address this, we propose SLQ, a parameter-efficient tuning framework that adapts MLLMs for retrieval while keeping the backbone entirely frozen. SLQ introduces a small set of Shared Latent Queries that are appended to both text and image tokens, leveraging the model's native causal attention to aggregate multimodal context into a unified embedding space. Furthermore, to better evaluate retrieval beyond superficial pattern matching, we construct KARR-Bench, a benchmark designed for knowledge-aware reasoning retrieval. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CnFaker/SLQ
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.