3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat
Olivia Zumsteg, Jannis Widmer, Yann Bourd\'e, Norbert Kirchgessner, Andreas Hund, Lukas Roth, Paraskevi Nousi

TL;DR
This paper introduces a hybrid 2D-3D approach with knowledge distillation to accurately and efficiently estimate wheat spike volume from images, reducing inference time significantly while maintaining high accuracy.
Contribution
It proposes a novel training method combining 3D geometric information with 2D image models via knowledge distillation for improved spike volume estimation.
Findings
Distilled models reduce MAE from 654.31 mm$^3$ to around 640 mm$^3$.
Inference time decreases from 160 ms to 1.4 ms per spike.
Distillation improves correlation from 0.76 to 0.82.
Abstract
Accurate estimation of wheat spike volume is important for yield component analysis and stress resilience assessment, yet field-based measurement remains challenging. Active 3D sensing methods such as Light Detection and Ranging (LiDAR) or time-of-flight (ToF) are sensitive to plant motion or poorly suited to outdoor conditions, while 3D reconstructions are computationally expensive. Direct 2D image processing would offer computational advantages, but image-based models lack explicit geometric information. We therefore propose a hybrid 2D-3D approach with knowledge distillation during training while enabling efficient image-only inference. First, we train a rigid-invariant point cloud network using distance-based histogram features to obtain pose-robust geometric representations. We then combine the 3D model with a proposed multi-view image-based regulated Transformer (RT) in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
