ClimRetrieve: A Benchmarking Dataset for Information Retrieval from   Corporate Climate Disclosures

Tobias Schimanski; Jingwei Ni; Roberto Spacey; Nicola Ranger; Markus; Leippold

arXiv:2406.09818·cs.IR·October 2, 2024·1 cites

ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures

Tobias Schimanski, Jingwei Ni, Roberto Spacey, Nicola Ranger, Markus, Leippold

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ClimRetrieve, a new dataset for evaluating domain-specific information retrieval from corporate climate disclosures, addressing a key gap in climate communication analysis.

Contribution

It creates a benchmark dataset with labeled climate-related questions and explores integrating expert knowledge into retrieval systems, highlighting current limitations.

Findings

01

Embedding-based retrieval with expert knowledge improves accuracy

02

Significant challenges remain in knowledge-intensive climate domains

03

Dataset enables targeted evaluation of climate information retrieval methods

Abstract

To handle the vast amounts of qualitative data produced in corporate climate communication, stakeholders increasingly rely on Retrieval Augmented Generation (RAG) systems. However, a significant gap remains in evaluating domain-specific information retrieval - the basis for answer generation. To address this challenge, this work simulates the typical tasks of a sustainability analyst by examining 30 sustainability reports with 16 detailed climate-related questions. As a result, we obtain a dataset with over 8.5K unique question-source-answer pairs labeled by different levels of relevance. Furthermore, we develop a use case with the dataset to investigate the integration of expert knowledge into information retrieval with embeddings. Although we show that incorporating expert knowledge works, we also outline the critical limitations of embeddings in knowledge-intensive downstream domains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tobischimanski/climretrieve
noneOfficial

Videos

ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures· underline

Taxonomy

TopicsImpact of AI and Big Data on Business and Society