ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects
Ayush Maheshwari, Kaushal Sharma, Vivek Patel, Aditya Maheshwari

TL;DR
ParamBench is a comprehensive Hindi-language benchmark with over 17,000 graduate-level questions across 21 Indian subjects, designed to evaluate LLM understanding of culturally grounded, complex disciplinary topics.
Contribution
This paper introduces ParamBench, a novel large-scale, culturally specific benchmark for assessing LLMs on Indian graduate-level questions across diverse formats and subjects.
Findings
Gemma3-27B achieves 56.4% accuracy overall.
LLMs perform poorly on music, instruments, and law topics.
Culturally grounded reasoning remains a challenge for current LLMs.
Abstract
Large language models have been widely evaluated on tasks such as comprehension, summarization, code generation, etc. However, their performance on graduate-level, culturally grounded questions in the Indian context remains largely unexplored. Existing Indian benchmarks emphasise basic fact-orientated queries that offer limited assessment of a deeper disciplinary understanding tailored to the Indian setting. In this paper, we present ParamBench, consisting of more than 17K questions in the Hindi language, comprising questionnaires from 21 diverse subjects. These questions are primarily derived from a nationwide graduate-level entrance examination covering topics such as history, music, instruments, yoga, literature, philosophy, law, etc.~ specifically for the Indian context. Additionally, we assess the ability of LLMs to handle diverse question formats - such as list-based matching,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
