MIRB: Mathematical Information Retrieval Benchmark
Haocheng Ju, Bin Dong

TL;DR
MIRB is a comprehensive benchmark designed to evaluate the performance of retrieval models across diverse mathematical information retrieval tasks, facilitating progress in the field.
Contribution
The paper introduces MIRB, a unified benchmark with multiple tasks and datasets, to standardize evaluation of MIR models and foster advancements in mathematical information retrieval.
Findings
13 models evaluated on MIRB
Identified key challenges in MIR tasks
Benchmark covers 4 task types and 12 datasets
Abstract
Mathematical Information Retrieval (MIR) is the task of retrieving information from mathematical documents and plays a key role in various applications, including theorem search in mathematical libraries, answer retrieval on math forums, and premise selection in automated theorem proving. However, a unified benchmark for evaluating these diverse retrieval tasks has been lacking. In this paper, we introduce MIRB (Mathematical Information Retrieval Benchmark) to assess the MIR capabilities of retrieval models. MIRB includes four tasks: semantic statement retrieval, question-answer retrieval, premise retrieval, and formula retrieval, spanning a total of 12 datasets. We evaluate 13 retrieval models on this benchmark and analyze the challenges inherent to MIR. We hope that MIRB provides a comprehensive framework for evaluating MIR systems and helps advance the development of more effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
