Aryabhata: An exam-focused language model for JEE Math

Ritvik Rastogi; Sachin Dharashivkar; Sandeep Varma

arXiv:2508.08665·cs.AI·August 14, 2025

Aryabhata: An exam-focused language model for JEE Math

Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma

PDF

Open Access 1 Models 3 Reviews

TL;DR

Aryabhata 1.0 is a specialized 7B parameter language model optimized for JEE math, combining open-weight reasoning models, curriculum fine-tuning, and reinforcement learning to enhance accuracy and pedagogical reasoning for educational purposes.

Contribution

The paper introduces Aryabhata 1.0, a novel exam-focused math reasoning model built with a combination of open-weight models, curriculum learning, and reinforcement learning, tailored for JEE exam preparation.

Findings

01

Aryabhata outperforms existing models in accuracy and efficiency on JEE benchmarks.

02

It provides pedagogically useful step-by-step reasoning.

03

The model is openly released for community use and feedback.

Abstract

We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of- $n$ rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation along with novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The paper adopts a standard fine-tuning (post-training) pipeline, which is a reasonable approach for developing a domain-expert model tailored to a specific application scenario. 2. The model demonstrates superior performance on the JEE benchmark compared to other existing models.

Weaknesses

1. The paper does not introduce any new methods or insights; it merely combines existing techniques. 2. The paper lacks comprehensive ablation studies to analyze the contribution of each component. For instance, it remains unclear how much of the **performance gain comes from** Model Merging, SFT, or RLVR. Questions such as how the model would perform without Model Merging or without SFT are left unanswered. 3. The paper does not evaluate the model beyond mathematics, despite claiming that “we r

Reviewer 02Rating 2Confidence 5

Strengths

1. End-to-end. This paper covers various mainstream technologies, from model merging, to data acquisition, and then to different training methods such as SFT and RL.

Weaknesses

1. This paper does not address any scientific questions. Although it employs a variety of techniques, it reads more like a technical report, and the scientific motivation for using LLMs to solve JEE problems is not clearly articulated. 2. The paper lacks baseline methods. It does not present the performance after model merging or the results at each training stage, making it unclear which techniques are actually effective. 3. The papepr does not conduct ablation studies on the methods used. The

Reviewer 03Rating 2Confidence 4

Strengths

1. The paper is well-written and easy to follow. 2. It presents an engineering-oriented approach to training a reasoning model for India’s Joint Entrance Examination (JEE), combining model merging, supervised fine-tuning, and reinforcement learning in a clear and organized manner.

Weaknesses

1. The proposed methodology is mainly engineering-oriented, without addressing a clear research challenge or presenting substantial methodological novelty. The techniques described, such as group-relative advantage estimation and exploration strategies like adaptive group resizing and temperature scaling, have been explored in prior works and appear incremental. 2. More importantly, the paper lacks ablation studies to validate the effectiveness of the proposed components. Without such analysis,

Code & Models

Models

🤗
PhysicsWallahAI/Aryabhata-1.0
model· 286 dl· ♡ 110
286 dl♡ 110

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Explainable Artificial Intelligence (XAI)