AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Jianlyu Chen; Nan Wang; Chaofan Li; Bo Wang; Shitao Xiao; Han Xiao; Hao Liao; Defu Lian; Zheng Liu

arXiv:2412.13102·cs.IR·July 25, 2025

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Jianlyu Chen, Nan Wang, Chaofan Li, Bo Wang, Shitao Xiao, Han Xiao, Hao Liao, Defu Lian, Zheng Liu

PDF

Open Access 1 Repo

TL;DR

AIR-Bench introduces an automated, diverse, and evolving benchmark for information retrieval evaluation, leveraging large language models to generate high-quality test data across multiple domains and languages.

Contribution

It presents a novel automated and dynamic benchmark for IR evaluation that reduces reliance on human labeling and covers diverse tasks, domains, and languages.

Findings

01

Generated data aligns well with human-labeled data.

02

AIR-Bench covers diverse tasks, domains, and languages.

03

Benchmark resources are publicly available.

Abstract

Evaluation plays a crucial role in the advancement of information retrieval (IR) models. However, current benchmarks, which are based on predefined domains and human-labeled data, face limitations in addressing evaluation needs for emerging domains both cost-effectively and efficiently. To address this challenge, we propose the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench). AIR-Bench is distinguished by three key features: 1) Automated. The testing data in AIR-Bench is automatically generated by large language models (LLMs) without human intervention. 2) Heterogeneous. The testing data in AIR-Bench is generated with respect to diverse tasks, domains and languages. 3) Dynamic. The domains and languages covered by AIR-Bench are constantly augmented to provide an increasingly comprehensive evaluation benchmark for community developers. We develop a reliable and robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

air-bench/air-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior