Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance
Gregory Polyakov, Harrisen Scells, Carsten Eickhoff

TL;DR
This paper systematically investigates the phenomenon of wacky weights in SPLADE models, revealing their origins, prevalence, and impact on retrieval effectiveness through comprehensive experiments and formal analysis.
Contribution
It provides the first formal definition of wacky weights, introduces a novel measure for their prevalence, and analyzes factors influencing their occurrence in SPLADE models.
Findings
Larger vocabularies increase wacky token prevalence.
Stricter sparsity regularizers reduce wacky token prevalence.
Wacky weights mainly contribute to in-domain retrieval effectiveness.
Abstract
Learned sparse retrieval models such as SPLADE combine the effectiveness of neural architectures with the efficiency of inverted indices. As these models assign weights to terms from a fixed vocabulary, interpretability is often touted as a major benefit of these models. However, the emergence of wacky weights, i.e., expansion terms that appear semantically unrelated to the input, limits interpretability. While prior research has anecdotally observed this phenomenon, there is a lack of systematic understanding regarding their origins, prevalence, and contribution to retrieval effectiveness. In this paper, we reproduce SPLADE-v2 to systematically investigate wacky weights across the SPLADE family of models. We present a comprehensive dissection of wacky weights, providing a formal definition of wackiness based on the lexical utility of expansion terms. Furthermore, we introduce a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
