ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the   Query Input Length

Ben Giacalone; Richard Zanibbi

arXiv:2408.13672·cs.IR·August 27, 2024

ColBERT's [MASK]-based Query Augmentation: Effects of Quadrupling the Query Input Length

Ben Giacalone, Richard Zanibbi

PDF

Open Access

TL;DR

This paper investigates how increasing the number of [MASK] tokens in ColBERT's queries affects retrieval performance, finding that extending query length up to four times the original improves results without performance collapse.

Contribution

It demonstrates that augmenting queries with additional [MASK] tokens up to four times the original length enhances ColBERT's retrieval effectiveness without degrading performance.

Findings

01

Adding [MASK] tokens to extend queries improves retrieval performance.

02

Performance plateaus when query length reaches an average of 32 tokens.

03

Extending queries to 128 tokens does not significantly harm performance.

Abstract

A unique aspect of ColBERT is its use of [MASK] tokens in queries to score documents (query augmentation). Prior work shows [MASK] tokens weighting non-[MASK] query terms, emphasizing certain tokens over others , rather than introducing whole new terms as initially proposed. We begin by demonstrating that a term weighting behavior previously reported for [MASK] tokens in ColBERTv1 holds for ColBERTv2. We then examine the effect of changing the number of [MASK] tokens from zero to up to four times past the query input length used in training, both for first stage retrieval, and for scoring candidates, observing an initial decrease in performance with few [MASK]s, a large increase when enough [MASK]s are added to pad queries to an average length of 32, then a plateau in performance afterwards. Additionally, we compare baseline performance to performance when the query length is extended…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management