PLUME: Latent Reasoning Based Universal Multimodal Embedding

Chenwei He; Xiangzhao Hao; Tianyu Yang; Yuxiang Ma; Yuheng Jia; Lingxiang Wu; Chaoyang Zhao; Haiyun Guo; and Jinqiao Wang

arXiv:2604.02073·cs.CV·April 21, 2026

PLUME: Latent Reasoning Based Universal Multimodal Embedding

Chenwei He, Xiangzhao Hao, Tianyu Yang, Yuxiang Ma, Yuheng Jia, Lingxiang Wu, Chaoyang Zhao, Haiyun Guo, and Jinqiao Wang

PDF

1 Repo 1 Models

TL;DR

PLUME introduces a latent reasoning framework for universal multimodal embedding that replaces explicit chain-of-thought with continuous latent states, achieving faster inference and improved performance on complex multimodal tasks.

Contribution

It proposes a novel latent reasoning approach with a semantic-anchor-guided transition adapter and a progressive training curriculum, outperforming explicit-CoT methods in speed and accuracy.

Findings

01

Outperforms explicit-CoT UME baselines on MMEB-v2 benchmark.

02

Reduces reasoning steps from hundreds to fewer than 10 latent steps.

03

Achieves over 30x faster inference in retrieval tasks.

Abstract

Universal multimodal embedding (UME) maps heterogeneous inputs into a shared retrieval space with a single model. Recent approaches improve UME by generating explicit chain-of-thought (CoT) rationales before extracting embeddings, enabling multimodal large language models to better infer complex query intent. However, explicit CoT incurs substantial inference overhead and can compress rich multimodal evidence into a narrow textual bottleneck. We propose PLUME, a latent reasoning framework that advances UME by replacing verbalized CoT with a short autoregressive rollout of continuous latent states. To support diverse multimodal queries, PLUME further introduces a semantic-anchor-guided transition adapter that steers latent rollout along different reasoning trajectories under the same fixed computation budget. To stabilize training, PLUME adopts a progressive explicit-to-latent curriculum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoxiangzhao12138/PLUME
github

Models

🤗
CUDAOUTOFMEMORY/PLUME-Qwen2-VL-2B
model· 9 dl· ♡ 1
9 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.