Neural Passage Quality Estimation for Static Pruning

Xuejun Chang; Debabrata Mishra; Craig Macdonald; Sean MacAvaney

arXiv:2407.12170·cs.IR·July 18, 2024

Neural Passage Quality Estimation for Static Pruning

Xuejun Chang, Debabrata Mishra, Craig Macdonald, Sean MacAvaney

PDF

1 Repo 1 Models

TL;DR

This paper introduces neural methods to predict passage quality independently of queries, enabling significant corpus pruning that reduces resource consumption while maintaining search effectiveness.

Contribution

It presents novel neural techniques for query-agnostic passage quality estimation, allowing effective corpus pruning in neural search engines.

Findings

01

Prunes over 25% of passages without losing effectiveness

02

Reduces computational resources, power, and carbon footprint

03

Enables lightweight pre-pruning before encoding steps

Abstract

Neural networks -- especially those that use large, pre-trained language models -- have improved search engines in various ways. Most prominently, they can estimate the relevance of a passage or document to a user's query. In this work, we depart from this direction by exploring whether neural networks can effectively predict which of a document's passages are unlikely to be relevant to any query submitted to the search engine. We refer to this query-agnostic estimation of passage relevance as a passage's quality. We find that our novel methods for estimating passage quality allow passage corpora to be pruned considerably while maintaining statistically equivalent effectiveness; our best methods can consistently prune >25% of passages in a corpora, across various retrieval pipelines. Such substantial pruning reduces the operating costs of neural search engines in terms of computing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

terrierteam/pyterrier-quality
pytorchOfficial

Models

🤗
pyterrier-quality/mqt5-small
model· 2 dl
2 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning