ChemPro: A Progressive Chemistry Benchmark for Large Language Models

Aaditya Baranwal; Shruti Vyas

arXiv:2602.03108·cs.CL·April 22, 2026

ChemPro: A Progressive Chemistry Benchmark for Large Language Models

Aaditya Baranwal, Shruti Vyas

PDF

TL;DR

ChemPro is a comprehensive, progressive chemistry benchmark with 4100 questions designed to evaluate large language models across various difficulty levels and chemistry topics, revealing their strengths and limitations.

Contribution

Introduction of ChemPro, a structured chemistry benchmark with increasing difficulty to assess LLMs' proficiency in general chemistry understanding.

Findings

01

LLMs perform well on basic chemistry questions.

02

Accuracy declines as question complexity increases.

03

Highlights the need for more robust LLM methodologies.

Abstract

We introduce ChemPro, a progressive benchmark with 4100 natural language question-answer pairs in Chemistry, across 4 coherent sections of difficulty designed to assess the proficiency of Large Language Models (LLMs) in a broad spectrum of general chemistry topics. We include Multiple Choice Questions and Numerical Questions spread across fine-grained information recall, long-horizon reasoning, multi-concept questions, problem-solving with nuanced articulation, and straightforward questions in a balanced ratio, effectively covering Bio-Chemistry, Inorganic-Chemistry, Organic-Chemistry and Physical-Chemistry. ChemPro is carefully designed analogous to a student's academic evaluation for basic to high-school chemistry. A gradual increase in the question difficulty rigorously tests the ability of LLMs to progress from solving basic problems to solving more sophisticated challenges. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.