Enhancing LLMs for Governance with Human Oversight: Evaluating and   Aligning LLMs on Expert Classification of Climate Misinformation for   Detecting False or Misleading Claims about Climate Change

Mowafak Allaham; Ayse D. Lokmanoglu; P. Sol Hart; Erik C. Nisbet

arXiv:2501.13802·cs.CY·March 11, 2025

Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change

Mowafak Allaham, Ayse D. Lokmanoglu, P. Sol Hart, Erik C. Nisbet

PDF

Open Access

TL;DR

This paper evaluates the ability of large language models to classify climate misinformation, emphasizing the importance of human oversight and expert-annotated data for effective governance and misinformation detection.

Contribution

It demonstrates that fine-tuning GPT-3.5-turbo on expert-annotated datasets achieves expert-level accuracy in climate misinformation classification, outperforming open-source models and existing tools.

Findings

01

Open-source models underperform proprietary models in climate misinformation classification.

02

Existing climate-focused tools outperform many proprietary LLMs like GPT-4o.

03

Fine-tuning GPT-3.5-turbo on expert data achieves expert-level classification accuracy.

Abstract

Climate misinformation is a problem that has the potential to be substantially aggravated by the development of Large Language Models (LLMs). In this study we evaluate the potential for LLMs to be part of the solution for mitigating online dis/misinformation rather than the problem. Employing a public expert annotated dataset and a curated sample of social media content we evaluate the performance of proprietary vs. open source LLMs on climate misinformation classification task, comparing them to existing climate-focused computer-assisted tools and expert assessments. Results show (1) open-source models substantially under-perform in classifying climate misinformation compared to proprietary models, (2) existing climate-focused computer-assisted tools leveraging expert-annotated datasets continues to outperform many of proprietary models, including GPT-4o, and (3) demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk Perception and Management · Misinformation and Its Impacts · International Arbitration and Investment Law