Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change
Murathan Kurfal{\i}, Shorouq Zahra, Joakim Nivre, Gabriele Messori

TL;DR
Climate-Eval introduces a comprehensive benchmark with 25 climate-related NLP tasks and evaluates various large language models, revealing their capabilities and limitations in addressing climate change discourse.
Contribution
It provides the first extensive benchmark for climate change NLP tasks and systematically assesses open-source LLMs in this domain.
Findings
Open-source LLMs show varied performance across climate tasks.
Zero-shot and few-shot settings reveal strengths and limitations of models.
Benchmark enables standardized evaluation of climate-related NLP models.
Abstract
Climate-Eval is a comprehensive benchmark designed to evaluate natural language processing models across a broad range of tasks related to climate change. Climate-Eval aggregates existing datasets along with a newly developed news classification dataset, created specifically for this release. This results in a benchmark of 25 tasks based on 13 datasets, covering key aspects of climate discourse, including text classification, question answering, and information extraction. Our benchmark provides a standardized evaluation suite for systematically assessing the performance of large language models (LLMs) on these tasks. Additionally, we conduct an extensive evaluation of open-source LLMs (ranging from 2B to 70B parameters) in both zero-shot and few-shot settings, analyzing their strengths and limitations in the domain of climate change.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
