Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter
Luca Crupi, Elia Cereda, Alessandro Giusti, and Daniele Palossi

TL;DR
This paper presents a novel CNN-based system for human pose estimation on nano-drones by fusing simulated depth and image data, achieving robust real-world performance with minimal onboard resources.
Contribution
It introduces a multi-zone depth sensor and deep learning models trained solely in simulation that effectively transfer to real-world nano-drone pose estimation.
Findings
58% improvement in horizontal pose error
51% reduction in angular pose error
Effective sim-to-real transfer of models
Abstract
Nano-quadcopters are versatile platforms attracting the interest of both academia and industry. Their tiny form factor, i.e., 10 cm diameter, makes them particularly useful in narrow scenarios and harmless in human proximity. However, these advantages come at the price of ultra-constrained onboard computational and sensorial resources for autonomous operations. This work addresses the task of estimating human pose aboard nano-drones by fusing depth and images in a novel CNN exclusively trained in simulation yet capable of robust predictions in the real world. We extend a commercial off-the-shelf (COTS) Crazyflie nano-drone -- equipped with a 320240 px camera and an ultra-low-power System-on-Chip -- with a novel multi-zone (88) depth sensor. We design and compare different deep-learning models that fuse depth and image inputs. Our models are trained exclusively on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
