HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision
Shengli Zhou, Jianuo Zhu, Qilin Huang, Fangjing Wang, Yanfu Zhang, Feng Zheng

TL;DR
HCNQA introduces a hierarchical supervision approach for 3D VQA that guides models through phased concentration narrowing, improving reasoning pathways and overall performance.
Contribution
The paper proposes a novel hierarchical concentration narrowing supervision method for 3D VQA, enhancing reasoning pathway development and model accuracy.
Findings
Improved reasoning pathways in 3D VQA models.
Enhanced accuracy over existing answer-centric methods.
Effective supervision at key reasoning checkpoints.
Abstract
3D Visual Question-Answering (3D VQA) is pivotal for models to perceive the physical world and perform spatial reasoning. Answer-centric supervision is a commonly used training method for 3D VQA models. Many models that utilize this strategy have achieved promising results in 3D VQA tasks. However, the answer-centric approach only supervises the final output of models and allows models to develop reasoning pathways freely. The absence of supervision on the reasoning pathway enables the potential for developing superficial shortcuts through common patterns in question-answer pairs. Moreover, although slow-thinking methods advance large language models, they suffer from underthinking. To address these issues, we propose \textbf{HCNQA}, a 3D VQA model leveraging a hierarchical concentration narrowing supervision method. By mimicking the human process of gradually focusing from a broad area…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Robotics and Sensor-Based Localization
