Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion
Bishoy Gerges, Barbara Bazzana, Nicol\`o Botteghi, Youssef Aboudorra,, Antonio Franchi

TL;DR
This paper introduces a novel visual servoing method using latent diffusion models that enables UAVs to reach visual targets even when they are initially not visible, by learning a latent representation for planning.
Contribution
It proposes a new approach combining generative diffusion models with latent representations for vision-based UAV navigation, allowing target reaching without initial visibility.
Findings
Successfully reaches targets with initially invisible views in simulation
Uses latent diffusion models for trajectory planning in UAV navigation
Employs a Cross-Modal Variational Autoencoder for compact image representation
Abstract
In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
