ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D   Occupancy Perception via View-Guided Transformers

Jinke Li; Xiao He; Chonghua Zhou; Xiaoqiang Cheng; Yang Wen; Dan Zhang

arXiv:2405.04299·cs.CV·July 15, 2024·1 cites

ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang

PDF

Open Access 1 Repo 1 Models

TL;DR

ViewFormer introduces a transformer-based framework utilizing view-guided attention for improved multi-view 3D occupancy perception, effectively aggregating spatial and temporal features to enhance dynamic scene understanding.

Contribution

The paper proposes a novel view attention mechanism and a scalable transformer framework for multi-view 3D occupancy perception, along with a new benchmark for occupancy flow.

Findings

01

Outperforms prior state-of-the-art methods in 3D occupancy tasks

02

Effectively models dynamic scenes with fine-grained flow representation

03

Demonstrates scalability across multiple multi-view 3D perception tasks

Abstract

3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to sensor deployment constraints. To address this issue, we propose our learning-first view attention mechanism for effective multi-view feature aggregation. Moreover, we showcase the scalability of our view attention across diverse multi-view 3D tasks, including map construction and 3D object detection. Leveraging the proposed view attention as well as an additional multi-frame streaming temporal attention, we introduce ViewFormer, a vision-centric transformer-based framework for spatiotemporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

viewformerocc/viewformer-occ
pytorchOfficial

Models

🤗
viewformer/ViewFormer-Occ
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Computer Graphics and Visualization Techniques · Video Surveillance and Tracking Methods