Learn 3D VQA Better with Active Selection and Reannotation
Shengli Zhou, Yang Liu, Feng Zheng

TL;DR
This paper introduces a multi-turn active learning approach for 3D Visual Question Answering that effectively identifies and reannotates misleading data, improving model performance while reducing training costs.
Contribution
It proposes a novel active learning strategy that uses semantic uncertainty and reannotation to address misleading labels in 3D VQA datasets, enhancing training efficiency.
Findings
Improved 3D VQA accuracy with less training data
Halved training costs for high-accuracy models
Effective identification and correction of misleading annotations
Abstract
3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scene data enlarges the negative effect of misleading annotations. Although active learning strategies can select valuable instances for training, they fail to identify and resolve misleading labels, which the oracle inevitably provides in practice. To address this issue, we propose a multi-turn interactive active learning strategy. This strategy selects data based on models' semantic uncertainty to form a solid knowledge foundation more effectively and actively requests reannotation from an oracle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Advanced X-ray and CT Imaging · Medical Image Segmentation Techniques
