LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation
Shaoxiong Zhan, Hai Lin, Hongming Tan, Xiaodong Cai, Hai-Tao Zheng, Xin Su, Zifei Shan, Ruitong Liu, Hong-Gee Kim

TL;DR
LexSemBridge enhances dense retrieval models by incorporating fine-grained, input-aware vector modulation to improve performance on challenging, fine-grained retrieval tasks in both text and vision domains.
Contribution
Introduces LexSemBridge, a novel framework that improves dense representations with input-aware vector modulation using three paradigms, without altering backbone encoders.
Findings
Significant improvements on fine-grained retrieval tasks
Effective across both text and vision modalities
Plug-in design preserves semantic integrity
Abstract
As queries in retrieval-augmented generation (RAG) pipelines powered by large language models (LLMs) become increasingly complex and diverse, dense retrieval models have demonstrated strong performance in semantic matching. Nevertheless, they often struggle with fine-grained retrieval tasks, where precise keyword alignment and span-level localization are required, even in cases with high lexical overlap that would intuitively suggest easier retrieval. To systematically evaluate this limitation, we introduce two targeted tasks, keyword retrieval and part-of-passage retrieval, designed to simulate practical fine-grained scenarios. Motivated by these observations, we propose LexSemBridge, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
