AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
Chenyue Li, Wen Deng, Mengqian Lu, Binhang Yuan

TL;DR
AtmosSci-Bench is a comprehensive benchmark designed to evaluate large language models' performance across key atmospheric science domains, facilitating systematic assessment of reasoning and problem-solving abilities in climate-related tasks.
Contribution
This paper introduces AtmosSci-Bench, a novel, dual-format benchmark for systematically evaluating LLMs in atmospheric science, including diverse problem generation and comprehensive model analysis.
Findings
LLMs show varied reasoning capabilities across atmospheric science categories.
Instruction-tuned models outperform domain-specific models in certain tasks.
The benchmark enables scalable and in-depth evaluation of LLMs in climate science.
Abstract
The rapid advancements in large language models (LLMs), particularly in their reasoning capabilities, hold transformative potential for addressing complex challenges and boosting scientific discovery in atmospheric science. However, leveraging LLMs effectively in this domain requires a robust and comprehensive evaluation benchmark. Toward this end, we present AtmosSci-Bench, a novel benchmark designed to systematically assess LLM performance across five core categories of atmospheric science problems: hydrology, atmospheric dynamics, atmospheric physics, geophysics, and physical oceanography. AtmosSci-Bench features a dual-format design comprising both multiple-choice questions (MCQs) and open-ended questions (OEQs), enabling scalable automated evaluation alongside deeper analysis of conceptual understanding. We employ a template-based MCQ generation framework to create diverse,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrological Forecasting Using AI · Topic Modeling · Meteorological Phenomena and Simulations
