Multi-modal Tracking for Object based SLAM
Prateek Singhal, Ruffin White, Henrik Christensen

TL;DR
This paper introduces a real-time 3D visual object tracking system for monocular cameras that combines semantic mapping and visual odometry to improve tracking accuracy in complex environments.
Contribution
It presents a novel framework for integrating semantic mapping and visual odometry data through information fusion, enhancing object tracking performance in SLAM.
Findings
Achieved a mean error of 0.23m per frame in tracking
Reduced relative error by 9% compared to existing methods
Demonstrated effectiveness on challenging sequences
Abstract
We present an on-line 3D visual object tracking framework for monocular cameras by incorporating spatial knowledge and uncertainty from semantic mapping along with high frequency measurements from visual odometry. Using a combination of vision and odometry that are tightly integrated we can increase the overall performance of object based tracking for semantic mapping. We present a framework for integration of the two data-sources into a coherent framework through information based fusion/arbitration. We demonstrate the framework in the context of OmniMapper[1] and present results on 6 challenging sequences over multiple objects compared to data obtained from a motion capture systems. We are able to achieve a mean error of 0.23m for per frame tracking showing 9% relative error less than state of the art tracker.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
