Unifying Foundation Models with Quadrotor Control for Visual Tracking Beyond Object Categories
Alessandro Saviolo, Pratyaksh Rao, Vivek Radhakrishnan, Jiuhong Xiao,, and Giuseppe Loianno

TL;DR
This paper presents a unified perception and control framework for quadrotors that leverages foundation models for universal object detection and a resilient, model-free controller, enabling robust real-time visual tracking across diverse scenarios.
Contribution
It introduces a novel perception system based on foundation models combined with a multi-layered tracker and a model-free controller for versatile quadrotor visual tracking.
Findings
Effective in diverse indoor and outdoor environments
Maintains target visibility despite occlusions and lighting changes
Operates efficiently on limited onboard hardware
Abstract
Visual control enables quadrotors to adaptively navigate using real-time sensory data, bridging perception with action. Yet, challenges persist, including generalization across scenarios, maintaining reliability, and ensuring real-time responsiveness. This paper introduces a perception framework grounded in foundation models for universal object detection and tracking, moving beyond specific training categories. Integral to our approach is a multi-layered tracker integrated with the foundation detector, ensuring continuous target visibility, even when faced with motion blur, abrupt light shifts, and occlusions. Complementing this, we introduce a model-free controller tailored for resilient quadrotor visual tracking. Our system operates efficiently on limited hardware, relying solely on an onboard camera and an inertial measurement unit. Through extensive validation in diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Optical Sensing Technologies
