Declarative Experimentation in Information Retrieval using PyTerrier

Craig Macdonald; Nicola Tonellotto

arXiv:2007.14271·cs.IR·July 29, 2020

Declarative Experimentation in Information Retrieval using PyTerrier

Craig Macdonald, Nicola Tonellotto

PDF

5 Repos 2 Models

TL;DR

This paper introduces PyTerrier, a declarative framework for designing, optimizing, and evaluating complex information retrieval pipelines, enabling more expressive and efficient IR experimentation similar to deep learning frameworks.

Contribution

PyTerrier provides a novel declarative approach for IR pipeline expression and automatic optimization, bridging the gap with deep learning platforms and enhancing efficiency.

Findings

01

Optimizations improve retrieval pipeline efficiency on TREC Robust and ClueWeb09 datasets.

02

PyTerrier enables expressive IR pipeline design close to conceptual models.

03

Framework supports execution on multiple IR backends like Anserini and Terrier.

Abstract

The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.