Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
Sebastian Hofst\"atter, Omar Khattab, Sophia Althammer, Mete Sertkan,, Allan Hanbury

TL;DR
ColBERTer is a neural retrieval model that combines contextualized late interaction with enhanced reduction techniques, significantly lowering storage needs while maintaining or improving effectiveness and interpretability across multiple datasets.
Contribution
It introduces ColBERTer, a novel neural retrieval approach that fuses multiple retrieval strategies and employs multi-task training to reduce storage and improve interpretability.
Findings
Reduces storage footprint by up to 2.5x without losing effectiveness.
Achieves index storage parity with plaintext size at minimal dimensions.
Demonstrates statistically significant gains over traditional baselines on diverse datasets.
Abstract
Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
