ClimateX: Do LLMs Accurately Assess Human Expert Confidence in Climate   Statements?

Romain Lacombe; Kerrie Wu; Eddie Dilworth

arXiv:2311.17107·cs.LG·November 30, 2023·2 cites

ClimateX: Do LLMs Accurately Assess Human Expert Confidence in Climate Statements?

Romain Lacombe, Kerrie Wu, Eddie Dilworth

PDF

Open Access 1 Repo

TL;DR

This paper introduces ClimateX, a dataset of climate statements with expert confidence labels, and evaluates LLMs' ability to assess human expert confidence, revealing limited accuracy and over-confidence issues.

Contribution

The paper presents ClimateX, a new expert-labeled dataset for climate statement confidence, and evaluates LLMs' performance in classifying expert confidence levels.

Findings

01

LLMs achieve up to 47% accuracy in classifying confidence levels.

02

Models tend to be over-confident on low and medium confidence statements.

03

Few-shot learning improves classification performance.

Abstract

Evaluating the accuracy of outputs generated by Large Language Models (LLMs) is especially important in the climate science and policy domain. We introduce the Expert Confidence in Climate Statements (ClimateX) dataset, a novel, curated, expert-labeled dataset consisting of 8094 climate statements collected from the latest Intergovernmental Panel on Climate Change (IPCC) reports, labeled with their associated confidence levels. Using this dataset, we show that recent LLMs can classify human expert confidence in climate-related statements, especially in a few-shot learning setting, but with limited (up to 47%) accuracy. Overall, models exhibit consistent and significant over-confidence on low and medium confidence statements. We highlight implications of our results for climate communication, LLMs evaluation strategies, and the use of LLMs in information retrieval systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rlacombe/climatex
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClimate Change Communication and Perception · Computational and Text Analysis Methods · Expert finding and Q&A systems