Curating corpora with classifiers: A case study of clean energy sentiment online
Michael V. Arnold, Peter Sheridan Dodds, Christopher M. Danforth

TL;DR
This paper demonstrates how transformer-based classifiers can effectively curate social media corpora, specifically filtering relevant tweets for clean energy sentiment analysis with high accuracy and low cost.
Contribution
It introduces a method of using fine-tuned transformer models for rapid, accurate corpus curation in social media analysis, improving over keyword-based filtering.
Findings
Achieved F1 scores up to 0.95 in filtering relevant tweets.
Fine-tuning transformer models is cost-effective and highly accurate.
Method enhances real-time social media data analysis pipelines.
Abstract
Well curated, large-scale corpora of social media posts containing broad public opinion offer an alternative data source to complement traditional surveys. While surveys are effective at collecting representative samples and are capable of achieving high accuracy, they can be both expensive to run and lag public opinion by days or weeks. Both of these drawbacks could be overcome with a real-time, high volume data stream and fast analysis pipeline. A central challenge in orchestrating such a data pipeline is devising an effective method for rapidly selecting the best corpus of relevant documents for analysis. Querying with keywords alone often includes irrelevant documents that are not easily disambiguated with bag-of-words natural language processing methods. Here, we explore methods of corpus curation to filter irrelevant tweets using pre-trained transformer-based models, fine-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Recommender Systems and Techniques
