Learn 3D VQA Better with Active Selection and Reannotation

Shengli Zhou; Yang Liu; Feng Zheng

arXiv:2507.04630·cs.CV·August 19, 2025

Learn 3D VQA Better with Active Selection and Reannotation

Shengli Zhou, Yang Liu, Feng Zheng

PDF

Open Access

TL;DR

This paper introduces a multi-turn active learning approach for 3D Visual Question Answering that effectively identifies and reannotates misleading data, improving model performance while reducing training costs.

Contribution

It proposes a novel active learning strategy that uses semantic uncertainty and reannotation to address misleading labels in 3D VQA datasets, enhancing training efficiency.

Findings

01

Improved 3D VQA accuracy with less training data

02

Halved training costs for high-accuracy models

03

Effective identification and correction of misleading annotations

Abstract

3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scene data enlarges the negative effect of misleading annotations. Although active learning strategies can select valuable instances for training, they fail to identify and resolve misleading labels, which the oracle inevitably provides in practice. To address this issue, we propose a multi-turn interactive active learning strategy. This strategy selects data based on models' semantic uncertainty to form a solid knowledge foundation more effectively and actively requests reannotation from an oracle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced X-ray and CT Imaging · Medical Image Segmentation Techniques