TL;DR
This paper introduces DronePose, a photorealistic dataset synthesis pipeline and a monocular pose estimation model for UAVs, utilizing a smooth silhouette loss to improve 3D localization accuracy in cooperative UAV-human scenarios.
Contribution
It presents a novel data synthesis pipeline for realistic multimodal UAV datasets and a monocular pose estimation method enhanced by a smooth silhouette loss.
Findings
The smooth silhouette loss improves pose estimation accuracy.
Photorealistic synthetic data enhances model training.
The approach outperforms traditional silhouette objectives.
Abstract
In this work we consider UAVs as cooperative agents supporting human users in their operations. In this context, the 3D localisation of the UAV assistant is an important task that can facilitate the exchange of spatial information between the user and the UAV. To address this in a data-driven manner, we design a data synthesis pipeline to create a realistic multimodal dataset that includes both the exocentric user view, and the egocentric UAV view. We then exploit the joint availability of photorealistic and synthesized inputs to train a single-shot monocular pose estimation model. During training we leverage differentiable rendering to supplement a state-of-the-art direct regression objective with a novel smooth silhouette loss. Our results demonstrate its qualitative and quantitative performance gains over traditional silhouette objectives. Our data and code are available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
