StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory
Zhiheng Li, Yubo Cui, Jiexi Zhong, Zheng Fang

TL;DR
StreamMOS introduces a streaming LiDAR-based moving object segmentation method with a dual-span memory mechanism and multi-view encoding, improving temporal consistency and accuracy in autonomous driving scenarios.
Contribution
The paper proposes a novel streaming network with short-term and long-term memory for consistent moving object segmentation in LiDAR sequences, incorporating multi-view encoding for enhanced feature extraction.
Findings
Achieves competitive results on SemanticKITTI and Sipailou Campus datasets.
Utilizes dual memory to improve temporal consistency in segmentation.
Employs multi-view encoder for better motion feature extraction.
Abstract
Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame. However, they often focus on transferring temporal cues in a single inference and regard every prediction as independent of others. This may cause inconsistent segmentation results for the same object in different frames. To overcome this issue, we propose a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences. Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial prior of moving objects and adopted to enhance current inference by temporal fusion. Meanwhile, we build a long-term memory to store previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
MethodsFocus · Convolution
