ClimaText: A Dataset for Climate Change Topic Detection
Francesco S. Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, and Markus Leippold

TL;DR
ClimaText introduces a new dataset for climate change topic detection in text, highlighting the limitations of keyword models and the potential of context-based algorithms like BERT for complex, implicit climate-related content.
Contribution
This paper presents ClimaText, a publicly available dataset for sentence-level climate change detection and evaluates different models, emphasizing the need for improved methods.
Findings
Keyword models are inadequate for complex climate topics
BERT can identify implicit and complex climate-related patterns
There is significant potential for improving climate change detection methods
Abstract
Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering, and fact-checking. However, automating this process is a challenge, as climate change is a complex, fast-moving, and often ambiguous topic with scarce resources for popular text-based AI tasks. In this paper, we introduce \textsc{ClimaText}, a dataset for sentence-based climate change topic detection, which we make publicly available. We explore different approaches to identify the climate change topic in various text sources. We find that popular keyword-based models are not adequate for such a complex and evolving task. Context-based algorithms like BERT \cite{devlin2018bert} can detect,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques
MethodsLinear Layer · WordPiece · Residual Connection · Attention Dropout · Weight Decay · Multi-Head Attention · Dense Connections · Attention Is All You Need · Adam · Softmax
