DeeperImpact: Optimizing Sparse Learned Index Structures

Soyuj Basnet; Jerry Gou; Antonio Mallia; Torsten Suel

arXiv:2405.17093·cs.IR·July 9, 2024

DeeperImpact: Optimizing Sparse Learned Index Structures

Soyuj Basnet, Jerry Gou, Antonio Mallia, Torsten Suel

PDF

Open Access 1 Repo 2 Models

TL;DR

DeeperImpact enhances sparse learned index retrieval by optimizing document expansion and training strategies, significantly narrowing the effectiveness gap with dense retrievers through the use of Llama 2 and advanced training techniques.

Contribution

The paper introduces a refined DeepImpact approach using Llama 2 for query prediction and effective training strategies, improving retrieval effectiveness over previous sparse models.

Findings

01

Replacing T5 with Llama 2 improves retrieval quality.

02

Hard negatives, distillation, and CoCondenser initialization boost performance.

03

DeeperImpact narrows the effectiveness gap with SPLADE.

Abstract

A lot of recent work has focused on sparse learned indexes that use deep neural architectures to significantly improve retrieval quality while keeping the efficiency benefits of the inverted index. While such sparse learned structures achieve effectiveness far beyond those of traditional inverted index-based rankers, there is still a gap in effectiveness to the best dense retrievers, or even to sparse methods that leverage more expensive optimizations such as query expansion and query term weighting. We focus on narrowing this gap by revisiting and optimizing DeepImpact, a sparse retrieval approach that uses DocT5Query for document expansion followed by a BERT language model to learn impact scores for document terms. We first reinvestigate the expansion process and find that the recently proposed Doc2Query -- query filtration does not enhance retrieval quality when used with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

basnetsoyuj/improving-learned-index
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Data Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · WordPiece · SentencePiece · Linear Warmup With Linear Decay · Gated Linear Unit · Weight Decay · Attention Dropout · Linear Layer