RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields
Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, Li Zhang

TL;DR
RoDyn-SLAM introduces a neural radiance field-based dynamic RGB-D SLAM framework that effectively handles dynamic environments by filtering invalid rays, improving pose estimation, and achieving state-of-the-art accuracy and robustness.
Contribution
It presents a novel dynamic SLAM method using neural radiance fields with motion masks and a divide-and-conquer pose optimization for better performance in dynamic scenes.
Findings
Achieves state-of-the-art accuracy in dynamic environments
Demonstrates robustness across challenging datasets
Enhances geometry constraints with edge warp loss
Abstract
Leveraging neural implicit representation to conduct dense RGB-D SLAM has been studied in recent years. However, this approach relies on a static environment assumption and does not work robustly within a dynamic environment due to the inconsistent observation of geometry and photometry. To address the challenges presented in dynamic environments, we propose a novel dynamic SLAM framework with neural radiance field. Specifically, we introduce a motion mask generation method to filter out the invalid sampled rays. This design effectively fuses the optical flow mask and semantic mask to enhance the precision of motion mask. To further improve the accuracy of pose estimation, we have designed a divide-and-conquer pose optimization algorithm that distinguishes between keyframes and non-keyframes. The proposed edge warp loss can effectively enhance the geometry constraints between adjacent…
Peer Reviews
Decision·Submitted to ICLR 2024
The paper presents the algorithm used with a clear description. Results are presented comparing to other techniques in the field.
Only 2 datasets are used for testing. Some acronyms are not provided what is ATE? in the results section. It does not appear that the authors address the degenerate cases for computing the fundamental matrix, how does their method handle this? With the comparisons, you should have compared to orb slam 3 and possibly DVO slam, an indirect and direct traditional method. There is no mention of computation times, can this run in real time and what type of computing resources are required for such.
- This paper revisits an old topic in dynamic SLAM, i.e., getting rid of dynamic pixels before explicit / implicit optimization process. The idea of fusing motion segmentation mask over multiple keyframes is sound, and the evaluation reported are comprehensive and solid. - As a system paper, the author did a great job covering both the overall system design, and the key components (masking and tracking) that contributes to the better performance of overall system. Writing and visualization are v
- While the overall writing is good, there are a few places that worthy of fix: e.g., in section 3.2 eq (4) there is no introduction on $j$ and $k$; also typos such as Camear in Fig 1. A thorough proofread is recommended.
1. The exploration of dynamic SLAM with neural radiance field representation is a relatively new and promising avenue. 2. The paper is well-written and easy to follow. 3. The evaluation results are visually compelling and present a convincing case for the proposed method.
A key concern in this paper is the insufficiently explained rationale for incorporating neural radiance field representation in dynamic SLAM, along with a noticeable absence of robust baseline comparisons during the evaluation. Please see questions for details.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robotic Path Planning Algorithms · Modular Robots and Swarm Intelligence
