CMAViT: Integrating Climate, Managment, and Remote Sensing Data for Crop Yield Estimation with Multimodel Vision Transformers
Hamid Kamangir, Brent. S. Sams, Nick Dokoozlian, Luis Sanchez, J., Mason. Earles

TL;DR
This paper presents CMAViT, a multimodal vision transformer that integrates climate, management, and remote sensing data for accurate vineyard crop yield prediction, outperforming traditional models.
Contribution
Introduction of CMAViT, a novel multi-modal transformer that combines spatial, temporal, and management data for pixel-level crop yield estimation.
Findings
Achieved R2 of 0.84 and MAPE of 8.22% on unseen data.
Outperformed traditional models like UNet-ConvLSTM.
Modality ablation showed each data type's importance for accuracy.
Abstract
Crop yield prediction is essential for agricultural planning but remains challenging due to the complex interactions between weather, climate, and management practices. To address these challenges, we introduce a deep learning-based multi-model called Climate-Management Aware Vision Transformer (CMAViT), designed for pixel-level vineyard yield predictions. CMAViT integrates both spatial and temporal data by leveraging remote sensing imagery and short-term meteorological data, capturing the effects of growing season variations. Additionally, it incorporates management practices, which are represented in text form, using a cross-attention encoder to model their interaction with time-series data. This innovative multi-modal transformer tested on a large dataset from 2016-2019 covering 2,200 hectares and eight grape cultivars including more than 5 million vines, outperforms traditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing in Agriculture · Remote Sensing and Land Use · Remote Sensing and LiDAR Applications
MethodsAttention Is All You Need · Label Smoothing · Dropout · Linear Layer · Byte Pair Encoding · Adam · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings
