Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin,, Wenjun Zeng

TL;DR
This paper introduces HTCL, a hierarchical temporal context learning method that enhances camera-based 3D semantic scene completion by effectively modeling relevant temporal information, outperforming existing approaches on benchmark datasets.
Contribution
The work proposes a novel two-step hierarchical approach for temporal context learning, including affinity measurement and dynamic refinement, improving scene completion accuracy.
Findings
Ranks 1st on SemanticKITTI benchmark.
Surpasses LiDAR-based methods in mIoU on OpenOccupancy.
Demonstrates effective temporal modeling for scene completion.
Abstract
Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion. The primary innovation of this work involves decomposing temporal context learning into two hierarchical steps: (a) cross-frame affinity measurement and (b) affinity-based dynamic refinement. Firstly, to separate critical relevant context from redundant information, we introduce the pattern affinity with scale-aware isolation and multiple independent learners…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Time Series Analysis and Forecasting
