Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation
Srijan Dokania, Dharini Raghavan

TL;DR
Zero-Splat TeleAssist presents a novel zero-shot sensor-fusion pipeline that enables real-time, fiducial-free 6-DoF pose estimation from CCTV streams for effective multilateral teleoperation.
Contribution
It introduces a new sensor-fusion framework combining vision-language segmentation, monocular depth, and 3D Gaussian Splatting for zero-shot, real-time robot pose estimation.
Findings
Provides real-time global robot positions and orientations
Operates without fiducials or depth sensors
Enables interaction-centric teleoperation
Abstract
We introduce Zero-Splat TeleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleoperation. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction, and 3D Gaussian Splatting (3DGS), TeleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation setup.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeleoperation and Haptic Systems · Robot Manipulation and Learning · Hand Gesture Recognition Systems
