Do Reasoning Models Enhance Embedding Models?

Wun Yu Chan; Shaojin Chen; Huihao Jing; Kwun Hang Lau; Elton Chun-Chai Li; Zihao Wang; Haoran Li; Yangqiu Song

arXiv:2601.21192·cs.AI·January 30, 2026

Do Reasoning Models Enhance Embedding Models?

Wun Yu Chan, Shaojin Chen, Huihao Jing, Kwun Hang Lau, Elton Chun-Chai Li, Zihao Wang, Haoran Li, Yangqiu Song

PDF

Open Access 10 Models 5 Datasets

TL;DR

This paper investigates whether reasoning-enhanced models improve semantic embeddings and finds that RLVR training does not significantly outperform base models in embedding tasks, due to manifold realignment effects.

Contribution

Introduces HRSA, a framework for analyzing representation similarity, and reveals how RLVR influences latent geometry without improving embedding performance.

Findings

01

RLVR causes local geometry reorganization but preserves global structure.

02

Contrastive learning aligns base and reasoning models, leading to manifold realignment.

03

RLVR optimizes within existing semantic landscapes rather than restructuring them.

Abstract

State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via Reinforcement Learning with Verifiable Rewards (RLVR), a natural question arises: do enhanced reasoning translate to superior semantic representations when these models serve as embedding initializations? Contrary to expectation, our evaluation on MTEB and BRIGHT reveals a **null effect**: embedding models initialized from RLVR-tuned backbones yield no consistent performance advantage over their base counterparts when subjected to identical training recipes. To unpack this paradox, we introduce **H**ierarchical **R**epresentation **S**imilarity **A**nalysis (HRSA), a framework that decomposes similarity across representation, geometry, and function levels. HRSA reveals that while RLVR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks