ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky, Spencer, Ho, Rose Yu, Duncan Watson-Parris, Yian Ma, Leon Bergen, Taylor, Berg-Kirkpatrick

TL;DR
ClimaQA introduces an automated evaluation framework and benchmark datasets for assessing climate question answering models, addressing the lack of comprehensive evaluation tools in climate science LLMs.
Contribution
The paper presents ClimaGen, a novel adaptive learning framework, along with ClimaQA-Gold and ClimaQA-Silver datasets for evaluating climate LLMs.
Findings
Different LLMs show varied performance on climate QA benchmarks.
The datasets enable more rigorous assessment of climate knowledge in LLMs.
Evaluation strategies reveal strengths and weaknesses of current climate LLMs.
Abstract
The use of Large Language Models (LLMs) in climate science has recently gained significant attention. However, a critical issue remains: the lack of a comprehensive evaluation framework capable of assessing the quality and scientific validity of model outputs. To address this issue, we develop ClimaGen (Climate QA Generator), an adaptive learning framework that generates question-answer pairs from graduate textbooks with climate scientists in the loop. As a result, we present ClimaQA-Gold, an expert-annotated benchmark dataset alongside ClimaQA-Silver, a large-scale, comprehensive synthetic QA dataset for climate science. Finally, we develop evaluation strategies and compare different LLMs on our benchmarks. Our results offer novel insights into various approaches used to enhance knowledge of climate LLMs. The source code is publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCO2 Sequestration and Geologic Interactions
