Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation
Xiang Li, Yupeng Zheng, Pengfei Li, Yilun Chen, Ya-Qin Zhang, Wenchao Ding

TL;DR
DiScene introduces a sparse query-based framework with multi-level knowledge distillation and teacher-guided initialization, significantly improving efficiency and robustness in indoor occupancy prediction for robotics applications.
Contribution
The paper proposes DiScene, a novel sparse query-based occupancy prediction method utilizing multi-level distillation and optimized warm-up, achieving state-of-the-art performance and faster inference.
Findings
Achieves 23.2 FPS without depth priors, outperforming baseline by 36.1%.
Surpasses existing methods like EmbodiedOcc by 3.7% with faster inference.
Demonstrates versatility across various indoor environments and benchmarks.
Abstract
Occupancy prediction provides critical geometric and semantic understanding for robotics but faces efficiency-accuracy trade-offs. Current dense methods suffer computational waste on empty voxels, while sparse query-based approaches lack robustness in diverse and complex indoor scenes. In this paper, we propose DiScene, a novel sparse query-based framework that leverages multi-level distillation to achieve efficient and robust occupancy prediction. In particular, our method incorporates two key innovations: (1) a Multi-level Consistent Knowledge Distillation strategy, which transfers hierarchical representations from large teacher models to lightweight students through coordinated alignment across four levels, including encoder-level feature alignment, query-level feature matching, prior-level spatial guidance, and anchor-level high-confidence knowledge transfer and (2) a Teacher-Guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Multimodal Machine Learning Applications
