NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models
Haruki Fujimaki, Makoto P. Kato

TL;DR
NumColBERT introduces a non-intrusive method to improve numerical condition retrieval in dense models, enhancing performance while maintaining deployment simplicity.
Contribution
It proposes a novel inference-time approach with a Numerical Gating Mechanism and Contrastive Learning to better handle numerical queries in late-interaction retrieval models.
Findings
NumColBERT outperforms standard fine-tuning baselines.
Achieves accuracy comparable or better than prior separate scoring methods.
Retains standard ColBERT indexing and scoring, enabling easy deployment.
Abstract
This study addresses the challenge of improving dense retrieval performance for queries containing numerical conditions, such as ``companies with more than one billion dollars in R&D expenditure.'' Although recent research has shown that standard models struggle with numeric information in domains such as finance, e-commerce, and medicine, existing solutions typically decompose queries into textual and numerical components and score them separately. These approaches modify late-interaction retrieval models such as ColBERT and introduce challenges in deployment, latency, and maintainability. To overcome these limitations, we propose NumColBERT, an inference-time non-intrusive method that enhances numerically conditioned retrieval while preserving the original late-interaction mechanism. Because NumColBERT retains the standard ColBERT indexing and MaxSim scoring pipeline, existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
