Invisible Servoing: a Visual Servoing Approach with Return-Conditioned   Latent Diffusion

Bishoy Gerges; Barbara Bazzana; Nicol\`o Botteghi; Youssef Aboudorra,; Antonio Franchi

arXiv:2409.13337·cs.RO·April 30, 2025

Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion

Bishoy Gerges, Barbara Bazzana, Nicol\`o Botteghi, Youssef Aboudorra,, Antonio Franchi

PDF

Open Access

TL;DR

This paper introduces a novel visual servoing method using latent diffusion models that enables UAVs to reach visual targets even when they are initially not visible, by learning a latent representation for planning.

Contribution

It proposes a new approach combining generative diffusion models with latent representations for vision-based UAV navigation, allowing target reaching without initial visibility.

Findings

01

Successfully reaches targets with initially invisible views in simulation

02

Uses latent diffusion models for trajectory planning in UAV navigation

03

Employs a Cross-Modal Variational Autoencoder for compact image representation

Abstract

In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis