UniCrop: A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction
Emiliya Khidirova, Oktay Karaku\c{s}

TL;DR
UniCrop is a versatile data pipeline that automates multi-source environmental data processing for scalable crop yield prediction, improving model accuracy and operational deployment.
Contribution
It introduces a universal, reusable data engineering framework that automates data acquisition, cleaning, and feature reduction for crop yield prediction across regions and crops.
Findings
LightGBM achieved RMSE = 465.1 kg/ha and R^2 = 0.6576
Ensemble of baseline models improved accuracy to RMSE = 463.2 kg/ha and R^2 = 0.6604
UniCrop reduces data engineering effort and enhances scalability in crop yield modeling.
Abstract
Accurate crop yield prediction relies on diverse data streams, including satellite, meteorological, soil, and topographic information. However, despite rapid advances in machine learning, existing approaches remain crop- or region-specific and require data engineering efforts. This limits scalability, reproducibility, and operational deployment. This study introduces UniCrop, a universal and reusable data pipeline designed to automate the acquisition, cleaning, harmonisation, and engineering of multi-source environmental data for crop yield prediction. For any given location, crop type, and temporal window, UniCrop automatically retrieves, harmonises, and engineers over 200 environmental variables (Sentinel-1/2, MODIS, ERA5-Land, NASA POWER, SoilGrids, and SRTM), reducing them to a compact, analysis-ready feature set utilising a structured feature reduction workflow with minimum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing in Agriculture · Smart Agriculture and AI · Soil Geostatistics and Mapping
