Exploring the Representation Power of SPLADE Models

Joel Mackenzie; Shengyao Zhuang; Guido Zuccon

arXiv:2306.16680·cs.IR·June 30, 2023

Exploring the Representation Power of SPLADE Models

Joel Mackenzie, Shengyao Zhuang, Guido Zuccon

PDF

Open Access 1 Repo

TL;DR

This paper investigates the SPLADE model's ability to encode ranking signals in sparse document representations, revealing its effectiveness even with non-traditional or random vocabulary terms.

Contribution

It provides empirical evidence that SPLADE can encode useful ranking signals beyond traditional lexical features, expanding understanding of its representation power.

Findings

01

SPLADE encodes signals even with stopwords or random words.

02

Constrained vocabularies do not significantly reduce SPLADE's effectiveness.

03

SPLADE's representations capture more than just lexical matching.

Abstract

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models. During training, SPLADE applies regularization to ensure postings lists are kept sparse -- with the aim of mimicking the properties of natural term distributions -- allowing efficient and effective lexical matching and ranking. However, we hypothesize that SPLADE may encode additional signals into common postings lists to further improve effectiveness. To explore this idea, we perform a number of empirical analyses where we re-train SPLADE with different, controlled vocabularies and measure how effective it is at ranking passages. Our findings suggest that SPLADE can effectively encode useful ranking signals in documents even when the vocabulary is constrained to terms that are not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ielab/understanding-splade
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications