Adaptive Multi-Scale Channel-Spatial Attention Aggregation Framework for 3D Indoor Semantic Scene Completion Toward Assisting Visually Impaired

Qi He; XiangXiang Wang; Jingtao Zhang; Yongbin Yu; Hongxiang Chu; Manping Fan; JingYe Cai; Zhenglin Yang

arXiv:2602.16385·cs.CV·April 16, 2026

Adaptive Multi-Scale Channel-Spatial Attention Aggregation Framework for 3D Indoor Semantic Scene Completion Toward Assisting Visually Impaired

Qi He, XiangXiang Wang, Jingtao Zhang, Yongbin Yu, Hongxiang Chu, Manping Fan, JingYe Cai, Zhenglin Yang

PDF

TL;DR

This paper introduces an adaptive multi-scale attention framework for monocular 3D scene completion, enhancing indoor assistive navigation for visually impaired individuals by detecting small objects and obstacles.

Contribution

It presents a novel attention-based method that improves 3D scene understanding from monocular RGB images, addressing noise and structural issues in feature fusion.

Findings

01

Achieves 27.88% mIoU on NYUv2 benchmark.

02

Improves small object detection by 16.9%.

03

Demonstrates real-time performance on a wearable device.

Abstract

Independent indoor mobility remains a critical challenge for individuals with visual impairments, largely due to the limited capability of existing assistive systems in detecting fine-grained hazardous objects such as chairs, tables, and small obstacles. These perceptual blind zones substantially increase the risk of collision in unfamiliar environments. To bridge the gap between monocular 3D vision research and practical assistive deployment, this paper proposes an Adaptive Multi-scale Attention Aggregation (AMAA) framework for monocular 3D semantic scene completion using only a wearable RGB camera. The proposed framework addresses two major limitations in 2D-to-3D feature lifting: noise diffusion during back-projection and structural instability in multi-scale fusion. A parallel channel--spatial attention mechanism is introduced to recalibrate lifted features along semantic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.