CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications
Jan Blumenkamp, Steven Morad, Jennifer Gielis, Amanda Prorok

TL;DR
CoViS-Net is a decentralized visual spatial foundation model enabling multi-robot pose estimation and spatial understanding in real-time without relying on network infrastructure, demonstrated in formation control tasks.
Contribution
It introduces a fully decentralized, platform-agnostic model for multi-robot spatial understanding that functions without camera overlap or existing networking infrastructure.
Findings
Provides accurate relative pose estimates
Enables real-time spatial comprehension
Supports multi-robot formation control
Abstract
Autonomous robot operation in unstructured environments is often underpinned by spatial understanding through vision. Systems composed of multiple concurrently operating robots additionally require access to frequent, accurate and reliable pose estimates. In this work, we propose CoViS-Net, a decentralized visual spatial foundation model that learns spatial priors from data, enabling pose estimation as well as spatial comprehension. Our model is fully decentralized, platform-agnostic, executable in real-time using onboard compute, and does not require existing networking infrastructure. CoViS-Net provides relative pose estimates and a local bird's-eye-view (BEV) representation, even without camera overlap between robots (in contrast to classical methods). We demonstrate its use in a multi-robot formation control task across various real-world settings. We provide code, models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
