Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks
Pranjali Pathre, Gunjan Gupta, M. Nomaan Qureshi, Mandyam Brunda,, Samarth Brahmbhatt, K. Madhava Krishna

TL;DR
Imagine2Servo introduces a diffusion-based image editing approach to generate intermediate goal images, overcoming traditional visual servoing limitations and enabling more flexible robotic navigation and manipulation tasks.
Contribution
The paper presents a novel diffusion-driven goal generation method that extends visual servoing to scenarios with minimal image overlap and multi-camera setups.
Findings
Effective in long-range navigation tasks
Handles minimal initial and target image overlap
Supports multi-camera feedback integration
Abstract
Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent challenges, such as the necessity for a target image at test time, the requirement of substantial overlap between initial and target images, and the reliance on feedback from a single camera. This paper introduces Imagine2Servo, an innovative approach leveraging diffusion-based image editing techniques to enhance visual servoing algorithms by generating intermediate goal images. This methodology allows for the extension of visual servoing applications beyond traditional constraints, enabling tasks like long-range navigation and manipulation without predefined goal images. We propose a pipeline that synthesizes subgoal images grounded in the task at hand,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
