SuiteEval: Simplifying Retrieval Benchmarks

Andrew Parry; Debasis Ganguly; Sean MacAvaney

arXiv:2602.18107·cs.IR·February 23, 2026

SuiteEval: Simplifying Retrieval Benchmarks

Andrew Parry, Debasis Ganguly, Sean MacAvaney

PDF

Open Access

TL;DR

SuiteEval is a unified framework that streamlines and standardizes retrieval evaluation, making it easier to perform reproducible and comparable benchmarks across various datasets and models.

Contribution

It introduces a comprehensive, automated evaluation pipeline with dynamic indexing and support for multiple benchmarks, simplifying IR evaluation workflows.

Findings

01

Supports major benchmarks like BEIR, LoTTE, MS MARCO, NanoBEIR, BRIGHT

02

Reduces boilerplate and standardizes evaluation process

03

Enables easy addition of new benchmark suites

Abstract

Information retrieval evaluation often suffers from fragmented practices -- varying dataset subsets, aggregation methods, and pipeline configurations -- that undermine reproducibility and comparability, especially for foundation embedding models requiring robust out-of-domain performance. We introduce SuiteEval, a unified framework that offers automatic end-to-end evaluation, dynamic indexing that reuses on-disk indices to minimise disk usage, and built-in support for major benchmarks (BEIR, LoTTE, MS MARCO, NanoBEIR, and BRIGHT). Users only need to supply a pipeline generator. SuiteEval handles data loading, indexing, ranking, metric computation, and result aggregation. New benchmark suites can be added in a single line. SuiteEval reduces boilerplate and standardises evaluations to facilitate reproducible IR research, as a broader benchmark set is increasingly required.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Libraries and Information Services · Research Data Management Practices