# A comprehensive UK crop yield dataset incorporating satellite, weather, and soil type information

**Authors:** Evangeline Corcoran, Daniel P. Bebber, Stelian Curceac, Natalia Efremova, Azam Lashkari, Andrew Mead, Richard J. Morris, Richard F. Pywell, John W. Redhead, Sebastian E. Ahnert

PMC · DOI: 10.1038/s41597-025-06528-x · 2026-02-20

## TL;DR

This paper introduces a large UK crop yield dataset combining satellite, weather, and soil data for agricultural research and machine learning applications.

## Contribution

The novel contribution is a comprehensive anonymized dataset integrating multiple data sources for crop yield prediction and modeling.

## Key findings

- The CYCleSS dataset includes precision yield data from 934 fields in England with satellite, weather, and soil data.
- Anonymization preserves data alignment while protecting privacy, offering a solution for agricultural data sharing.
- The dataset supports both machine learning and mechanistic crop growth model parameterization.

## Abstract

Agricultural research increasingly relies on data-driven approaches for crop yield prediction that complement more established crop growth models, including machine learning techniques. However, these approaches rely on large training datasets. Here, we present the Crop Yields, Climate, Soils, and Satellites (CYCleSS) dataset, a large-scale crop yield dataset derived from precision yield data for 934 fields across England on which a variety of crops are grown. In addition, the data also contains satellite-derived remote sensing data, weather data, and data on soil type, all aligned at a grid resolution of 10 km. Weather data is available at a daily temporal resolution, satellite data at 5-day resolution, while crop yield data is available at yearly resolution. This effort has been made possible through careful anonymisation of the yield data while preserving the alignment with remote sensing, weather, and soil data. This data will be useful both to train machine learning models of yield prediction as well as to parameterize mechanistic crop growth models. Furthermore, the anonymisation procedure itself will be of interest to the research community, as it represents a solution to a common problem on the interface of agricultural research and farming practice.

## Full-text entities

- **Diseases:** dry (MESH:D015352)
- **Chemicals:** CYCLeSS (-), water (MESH:D014867), nitrogen (MESH:D009584)
- **Species:** Beta vulgaris subsp. vulgaris (field beet, subspecies) [taxon 3555], Avena sativa (cultivated oat, species) [taxon 4498], Solanum tuberosum (potatoes, species) [taxon 4113], Triticum aestivum (bread wheat, species) [taxon 4565], x Triticosecale (triticale, genus) [taxon 49317], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Triticum turgidum subsp. durum (durum wheat, subspecies) [taxon 4567], Brassica napus (oilseed rape, species) [taxon 3708], Helianthus annuus (common sunflower, species) [taxon 4232]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13035921/full.md

---
Source: https://tomesphere.com/paper/PMC13035921