REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning
Seungmin Lee, Jeonghwan Lee, Hyunkuk Lim, Sejoon Kim, Mingi Sung

TL;DR
REZE is a novel regularization framework for domain-adaptive text embedding pre-finetuning that controls representation shifts, improving robustness and performance across various benchmarks.
Contribution
It introduces a representation regularization method that explicitly manages representation shifts during embedding pre-finetuning, outperforming existing approaches.
Findings
REZE outperforms standard pre-finetuning methods in multiple benchmarks.
REZE maintains stable embeddings where other methods collapse.
Analysis shows REZE aligns embedding shifts with the original manifold.
Abstract
Recent text embedding models are often adapted to specialized domains via contrastive pre-finetuning (PFT) on a naive collection of scattered, heterogeneous tasks. However, this approach often introduces task-induced bias alongside domain knowledge, leading to uncontrolled representation shifts that distort the pretrained embedding geometry and cause substantial performance degradation. To address this issue, we propose REZE, a representation regularization framework that explicitly controls representation shift during embedding pre-finetuning. REZE operates on the relations of anchor-positive pairs and decomposes them in an eigenspace. It then measures task-wise dispersion along each eigencomponent to identify task-variant directions and applies adaptive soft-shrinkage to suppress task-induced noise while preserving task-invariant semantic structure, without inference-time overhead.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
