SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for   Real-world Scene Flow

Zhiyang Lu; Qinghan Chen; Zhimin Yuan; Ming Cheng

arXiv:2408.07825·cs.CV·August 16, 2024

SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow

Zhiyang Lu, Qinghan Chen, Zhimin Yuan, Ming Cheng

PDF

Open Access

TL;DR

SSRFlow introduces a semantic-aware fusion framework with spatial-temporal re-embedding and domain adaptation, significantly improving scene flow estimation accuracy in real-world LiDAR data by addressing semantic, deformation, and domain gap challenges.

Contribution

The paper proposes a novel dual cross attentive fusion, spatial-temporal re-embedding, and domain adaptive losses to enhance scene flow estimation in real-world scenarios.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Outperforms existing methods in real-world LiDAR scene flow estimation.

03

Effectively bridges synthetic-to-real domain gap.

Abstract

Scene flow, which provides the 3D motion field of the first frame from two consecutive point clouds, is vital for dynamic scene perception. However, contemporary scene flow methods face three major challenges. Firstly, they lack global flow embedding or only consider the context of individual point clouds before embedding, leading to embedded points struggling to perceive the consistent semantic relationship of another frame. To address this issue, we propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based on semantic contexts. This is then integrated into Global Fusion Flow Embedding (GF) to initialize flow embedding based on global correlations in both contextual and Euclidean spaces. Secondly, deformations exist in non-rigid objects after the warping layer, which distorts the spatiotemporal relation between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis