Real-Time Branch-to-Tool Distance Estimation for Autonomous UAV Pruning: Benchmarking Five DEFOM-Stereo Variants from Simulation to Jetson Deployment
Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

TL;DR
This paper benchmarks five DEFOM-Stereo variants for real-time UAV branch-to-tool distance estimation, emphasizing the trade-offs between accuracy and inference speed on NVIDIA Jetson hardware for autonomous pruning tasks.
Contribution
It introduces a synthetic dataset for training, evaluates multiple DEFOM-Stereo variants, and proposes DEFOM-PrunePlus as a practical balance for real-time UAV pruning applications.
Findings
DEFOM-Stereo ViT-S achieves highest accuracy but is too slow for real-time use.
DEFOM-PrunePlus offers a good accuracy-speed trade-off suitable for deployment.
Lightweight variants run faster but lack sufficient accuracy for safe actuation.
Abstract
Autonomous tree pruning with unmanned aerial vehicles (UAVs) is a safety-critical real-world task: the onboard perception system must estimate the metric distance from a cutting tool to thin tree branches in real time so that the UAV can approach, align, and actuate the pruner without collision. We address this problem by training five variants of DEFOM-Stereo - a recent foundation-model-based stereo matcher - on a task-specific synthetic dataset and deploying the checkpoints on an NVIDIA Jetson Orin Super 16 GB. The training corpus is built in Unreal Engine 5 with a simulated ZED Mini stereo camera capturing 5,520 stereo pairs across 115 tree instances from three viewpoints at 2m distance; dense EXR depth maps provide exact, spatially complete supervision for thin branches. On the synthetic test set, DEFOM-Stereo ViT-S achieves the best depth-domain accuracy (EPE 1.74 px, D1-all 5.81%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
