MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation
Hanwei Zhang, Hideaki Uchiyama, Shintaro Ono, Hiroshi Kawasaki

TL;DR
MOTSLAM is a monocular dynamic SLAM system that integrates multiple object tracking, neural network-based depth estimation, and joint optimization to accurately track both camera and dynamic objects in real-world scenarios.
Contribution
This paper introduces MOTSLAM, a novel monocular dynamic SLAM system that combines MOT, depth estimation, and joint bundle adjustment for improved dynamic scene understanding.
Findings
Achieves state-of-the-art performance on KITTI for camera ego-motion
Effectively tracks dynamic objects with 3D bounding boxes
Joint optimization enhances accuracy of static and dynamic map points
Abstract
Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains a challenging problem due to the difficulty of associating dynamic features and estimating their positions. In this paper, we present MOTSLAM, a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first performs multiple object tracking (MOT) with associated both 2D and 3D bounding box detection to create initial 3D objects. Then, neural-network-based monocular depth estimation is applied to fetch the depth of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
