SPROUT: A Scalable Diffusion Foundation Model for Agricultural Vision

Shuai Xiang; Wei Guo; James Burridge; Shouyang Liu; Hao Lu; Tokihiro Fukatsu

arXiv:2603.27519·cs.CV·March 31, 2026

SPROUT: A Scalable Diffusion Foundation Model for Agricultural Vision

Shuai Xiang, Wei Guo, James Burridge, Shouyang Liu, Hao Lu, Tokihiro Fukatsu

PDF

1 Repo

TL;DR

SPROUT is a scalable diffusion-based foundation model specifically designed for agricultural vision tasks, trained on a large dataset, outperforming existing models with lower pre-training costs.

Contribution

Introduces SPROUT, a novel diffusion transformer model for agriculture, trained on 2.6 million images, achieving superior performance over prior models.

Findings

01

SPROUT outperforms state-of-the-art models on various agricultural tasks.

02

It requires significantly less pre-training cost than existing models.

03

The model effectively learns structure-aware representations through diffusion denoising.

Abstract

Vision Foundation Models (VFM) pre-trained on large-scale unlabeled data have achieved remarkable success on general computer vision tasks, yet typically suffer from significant domain gaps when applied to agriculture. In this context, we introduce $S P R O U T$ ( $S$ calable $P$ lant $R$ epresentation model via $O$ pen-field $U$ nsupervised $T$ raining), a multi-crop, multi-task agricultural foundation model trained via diffusion denoising. SPROUT leverages a VAE-free Pixel-space Diffusion Transformer to learn rich, structure-aware representations through denoising and enabling efficient end-to-end training. We pre-train SPROUT on a curated dataset of 2.6 million high-quality agricultural images spanning diverse crops, growth stages, and environments. Extensive experiments demonstrate that SPROUT consistently outperforms state-of-the-art web-pretrained and agricultural foundation models across a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UTokyo-FieldPhenomics-Lab/SPROUT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.