LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation

Shaoxiong Zhan; Hai Lin; Hongming Tan; Xiaodong Cai; Hai-Tao Zheng; Xin Su; Zifei Shan; Ruitong Liu; Hong-Gee Kim

arXiv:2508.17858·cs.IR·September 30, 2025

LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation

Shaoxiong Zhan, Hai Lin, Hongming Tan, Xiaodong Cai, Hai-Tao Zheng, Xin Su, Zifei Shan, Ruitong Liu, Hong-Gee Kim

PDF

1 Models 1 Datasets

TL;DR

LexSemBridge enhances dense retrieval models by incorporating fine-grained, input-aware vector modulation to improve performance on challenging, fine-grained retrieval tasks in both text and vision domains.

Contribution

Introduces LexSemBridge, a novel framework that improves dense representations with input-aware vector modulation using three paradigms, without altering backbone encoders.

Findings

01

Significant improvements on fine-grained retrieval tasks

02

Effective across both text and vision modalities

03

Plug-in design preserves semantic integrity

Abstract

As queries in retrieval-augmented generation (RAG) pipelines powered by large language models (LLMs) become increasingly complex and diverse, dense retrieval models have demonstrated strong performance in semantic matching. Nevertheless, they often struggle with fine-grained retrieval tasks, where precise keyword alignment and span-level localization are required, even in cases with high lexical overlap that would intuitively suggest easier retrieval. To systematically evaluate this limitation, we introduce two targeted tasks, keyword retrieval and part-of-passage retrieval, designed to simulate practical fine-grained scenarios. Motivated by these observations, we propose LexSemBridge, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Jasaxion/LexSemBridge_CLR_snowflake
model

Datasets

Jasaxion/LexSemBridge_eval
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.