Aryabhata: An exam-focused language model for JEE Math
Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma

TL;DR
Aryabhata 1.0 is a specialized 7B parameter language model optimized for JEE math, combining open-weight reasoning models, curriculum fine-tuning, and reinforcement learning to enhance accuracy and pedagogical reasoning for educational purposes.
Contribution
The paper introduces Aryabhata 1.0, a novel exam-focused math reasoning model built with a combination of open-weight models, curriculum learning, and reinforcement learning, tailored for JEE exam preparation.
Findings
Aryabhata outperforms existing models in accuracy and efficiency on JEE benchmarks.
It provides pedagogically useful step-by-step reasoning.
The model is openly released for community use and feedback.
Abstract
We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of- rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation along with novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper adopts a standard fine-tuning (post-training) pipeline, which is a reasonable approach for developing a domain-expert model tailored to a specific application scenario. 2. The model demonstrates superior performance on the JEE benchmark compared to other existing models.
1. The paper does not introduce any new methods or insights; it merely combines existing techniques. 2. The paper lacks comprehensive ablation studies to analyze the contribution of each component. For instance, it remains unclear how much of the **performance gain comes from** Model Merging, SFT, or RLVR. Questions such as how the model would perform without Model Merging or without SFT are left unanswered. 3. The paper does not evaluate the model beyond mathematics, despite claiming that “we r
1. End-to-end. This paper covers various mainstream technologies, from model merging, to data acquisition, and then to different training methods such as SFT and RL.
1. This paper does not address any scientific questions. Although it employs a variety of techniques, it reads more like a technical report, and the scientific motivation for using LLMs to solve JEE problems is not clearly articulated. 2. The paper lacks baseline methods. It does not present the performance after model merging or the results at each training stage, making it unclear which techniques are actually effective. 3. The papepr does not conduct ablation studies on the methods used. The
1. The paper is well-written and easy to follow. 2. It presents an engineering-oriented approach to training a reasoning model for India’s Joint Entrance Examination (JEE), combining model merging, supervised fine-tuning, and reinforcement learning in a clear and organized manner.
1. The proposed methodology is mainly engineering-oriented, without addressing a clear research challenge or presenting substantial methodological novelty. The techniques described, such as group-relative advantage estimation and exploration strategies like adaptive group resizing and temperature scaling, have been explored in prior works and appear incremental. 2. More importantly, the paper lacks ablation studies to validate the effectiveness of the proposed components. Without such analysis,
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Explainable Artificial Intelligence (XAI)
