Structured dataset of reported cloud seeding activities in the United States (2000-2025) using an LLM
Jared Joseph Donohue, Kara D. Lamb

TL;DR
This paper introduces a comprehensive, high-accuracy dataset of U.S. cloud seeding activities from 2000 to 2025, created using an innovative PDF-to-text pipeline and large language models, enabling better analysis of weather modification efforts.
Contribution
The work presents a novel, scalable method combining LLMs and PDF extraction to compile a detailed cloud seeding dataset, filling a significant data gap in environmental research.
Findings
Dataset contains 832 records with 98.38% accuracy
Demonstrates LLMs' effectiveness in extracting structured data from PDFs
Provides a publicly available resource for climate and weather modification studies
Abstract
Cloud seeding, a weather modification technique used to increase precipitation, has been practiced in the western United States since the 1940s. However, comprehensive datasets are not currently available to analyze these efforts. To address this gap, we present a structured dataset of reported cloud seeding activities in the U.S. from 2000-2025, including the project name, year, season, state, operator, seeding agent, apparatus used for deployment, stated purpose, target area, control area, start date, and end date. Combining our multi-stage PDF-to-text extraction pipeline with OpenAI's o3 large language model (LLM), we processed 832 historical reports from the National Oceanic and Atmospheric Administration (NOAA). The resulting dataset demonstrates 98.38% estimated accuracy, based on manual review of 200 randomly sampled records, and is publicly available on Zenodo. This dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoybean genetics and cultivation · Soil Carbon and Nitrogen Dynamics · Soil and Land Suitability Analysis
