A White Box Analysis of ColBERT
Thibault Formal, Benjamin Piwowarski, St\'ephane Clinchant

TL;DR
This paper provides a detailed analysis of the ColBERT model, revealing how it captures term importance and relies on exact matches, despite not satisfying classical IR axioms.
Contribution
It offers the first white-box dissection of ColBERT's matching process, highlighting its reliance on term importance and exact matching patterns.
Findings
ColBERT captures a notion of term importance
It relies on exact matches for important terms
Traditional IR axioms are not formally verified in ColBERT
Abstract
Transformer-based models are nowadays state-of-the-art in ad-hoc Information Retrieval, but their behavior is far from being understood. Recent work has claimed that BERT does not satisfy the classical IR axioms. However, we propose to dissect the matching process of ColBERT, through the analysis of term importance and exact/soft matching patterns. Even if the traditional axioms are not formally verified, our analysis reveals that ColBERT: (i) is able to capture a notion of term importance; (ii) relies on exact matches for important terms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Recommender Systems and Techniques · Algorithms and Data Compression
MethodsLinear Layer · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Adam · Attention Is All You Need · Layer Normalization · Dropout · Weight Decay · Dense Connections
