Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

David Thulke; Jakob Kemmler; Christian Dugast; Hermann Ney

arXiv:2505.15633·cs.CL·May 22, 2025

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

David Thulke, Jakob Kemmler, Christian Dugast, Hermann Ney

PDF

Open Access 1 Video

TL;DR

This paper investigates how to improve the faithfulness of retrieval-augmented large language models in climate science, introducing ClimateGPT Faithful+ which significantly enhances factual accuracy in generated responses.

Contribution

It presents an automatic method to assess faithfulness and develops ClimateGPT Faithful+ by fine-tuning to exclude unfaithful training data, boosting factual accuracy.

Findings

01

ClimateGPT Faithful+ increases faithfulness from 30% to 57%.

02

Automatic faithfulness assessment correlates with human judgments.

03

Fine-tuning on faithful data improves model reliability.

Abstract

Large language models that use retrieval augmented generation have the potential to unlock valuable knowledge for researchers, policymakers, and the public by making long and technical climate-related documents more accessible. While this approach can help alleviate factual hallucinations by relying on retrieved passages as additional context, its effectiveness depends on whether the model's output remains faithful to these passages. To address this, we explore the automatic assessment of faithfulness of different models in this setting. We then focus on ClimateGPT, a large language model specialised in climate science, to examine which factors in its instruction fine-tuning impact the model's faithfulness. By excluding unfaithful subsets of the model's training data, we develop ClimateGPT Faithful+, which achieves an improvement in faithfulness from 30% to 57% in supported atomic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions· underline

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems

MethodsFocus