Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao

TL;DR
This paper introduces AoPS-Instruct, a large dataset of Olympiad-level math QA pairs extracted from AoPS forum, and a contamination-resistant benchmark LiveAoPSBench, to improve and reliably evaluate LLMs' advanced math reasoning abilities.
Contribution
It presents an automated pipeline for extracting high-quality math QA data from AoPS forum and creates a dynamic, contamination-resistant benchmark for LLM evaluation.
Findings
Fine-tuning on AoPS-Instruct enhances LLM reasoning skills.
LLMs show performance decline over time on the benchmark.
Pre-training exposure may influence LLM success on older problems.
Abstract
Advances in Large Language Models (LLMs) have sparked interest in their ability to solve Olympiad-level math problems. However, the training and evaluation of these models are constrained by the limited size and quality of available datasets, as creating large-scale data for such advanced problems requires extensive effort from human experts. In addition, current benchmarks are prone to contamination, leading to unreliable evaluations. In this paper, we present an automated pipeline that leverages the rich resources of the Art of Problem Solving (AoPS) forum, which predominantly features Olympiad-level problems and community-driven solutions. Using open-source LLMs, we develop a method to extract question-answer pairs from the forum, resulting in AoPS-Instruct, a dataset of more than 600,000 high-quality QA pairs. Our experiments demonstrate that fine-tuning LLMs on AoPS-Instruct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpen Education and E-Learning · Engineering Education and Curriculum Development · Higher Education Learning Practices
MethodsSparse Evolutionary Training
