MIRB: Mathematical Information Retrieval Benchmark

Haocheng Ju; Bin Dong

arXiv:2505.15585·cs.IR·May 22, 2025

MIRB: Mathematical Information Retrieval Benchmark

Haocheng Ju, Bin Dong

PDF

Open Access 1 Repo

TL;DR

MIRB is a comprehensive benchmark designed to evaluate the performance of retrieval models across diverse mathematical information retrieval tasks, facilitating progress in the field.

Contribution

The paper introduces MIRB, a unified benchmark with multiple tasks and datasets, to standardize evaluation of MIR models and foster advancements in mathematical information retrieval.

Findings

01

13 models evaluated on MIRB

02

Identified key challenges in MIR tasks

03

Benchmark covers 4 task types and 12 datasets

Abstract

Mathematical Information Retrieval (MIR) is the task of retrieving information from mathematical documents and plays a key role in various applications, including theorem search in mathematical libraries, answer retrieval on math forums, and premise selection in automated theorem proving. However, a unified benchmark for evaluating these diverse retrieval tasks has been lacking. In this paper, we introduce MIRB (Mathematical Information Retrieval Benchmark) to assess the MIR capabilities of retrieval models. MIRB includes four tasks: semantic statement retrieval, question-answer retrieval, premise retrieval, and formula retrieval, spanning a total of 12 datasets. We evaluate 13 retrieval models on this benchmark and analyze the challenges inherent to MIR. We hope that MIRB provides a comprehensive framework for evaluating MIR systems and helps advance the development of more effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

j991222/mirb
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing