SuiteEval: Simplifying Retrieval Benchmarks
Andrew Parry, Debasis Ganguly, Sean MacAvaney

TL;DR
SuiteEval is a unified framework that streamlines and standardizes retrieval evaluation, making it easier to perform reproducible and comparable benchmarks across various datasets and models.
Contribution
It introduces a comprehensive, automated evaluation pipeline with dynamic indexing and support for multiple benchmarks, simplifying IR evaluation workflows.
Findings
Supports major benchmarks like BEIR, LoTTE, MS MARCO, NanoBEIR, BRIGHT
Reduces boilerplate and standardizes evaluation process
Enables easy addition of new benchmark suites
Abstract
Information retrieval evaluation often suffers from fragmented practices -- varying dataset subsets, aggregation methods, and pipeline configurations -- that undermine reproducibility and comparability, especially for foundation embedding models requiring robust out-of-domain performance. We introduce SuiteEval, a unified framework that offers automatic end-to-end evaluation, dynamic indexing that reuses on-disk indices to minimise disk usage, and built-in support for major benchmarks (BEIR, LoTTE, MS MARCO, NanoBEIR, and BRIGHT). Users only need to supply a pipeline generator. SuiteEval handles data loading, indexing, ranking, metric computation, and result aggregation. New benchmark suites can be added in a single line. SuiteEval reduces boilerplate and standardises evaluations to facilitate reproducible IR research, as a broader benchmark set is increasingly required.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Libraries and Information Services · Research Data Management Practices
