3D Question Answering

Shuquan Ye; Dongdong Chen; Songfang Han; Jing Liao

arXiv:2112.08359·cs.CV·November 30, 2022

3D Question Answering

Shuquan Ye, Dongdong Chen, Songfang Han, Jing Liao

PDF

TL;DR

This paper introduces the first 3D question answering framework, extending visual question answering to 3D point cloud data, and demonstrates its effectiveness with a new dataset and transformer-based model.

Contribution

It presents a novel transformer-based 3DQA framework and the first 3DQA dataset, advancing AI's ability to understand 3D real-world scenes.

Findings

01

Our 3DQA framework outperforms existing VQA methods on ScanQA.

02

The proposed model effectively integrates appearance, geometry, and language information.

03

The ScanQA dataset contains around 6,000 questions and 30,000 answers across 806 scenes.

Abstract

Visual Question Answering (VQA) has witnessed tremendous progress in recent years. However, most efforts only focus on the 2D image question answering tasks. In this paper, we present the first attempt at extending VQA to the 3D domain, which can facilitate artificial intelligence's perception of 3D real-world scenarios. Different from image based VQA, 3D Question Answering (3DQA) takes the color point cloud as input and requires both appearance and 3D geometry comprehension ability to answer the 3D-related questions. To this end, we propose a novel transformer-based 3DQA framework "3DQA-TR", which consists of two encoders for exploiting the appearance and geometry information, respectively. The multi-modal information of appearance, geometry, and the linguistic question can finally attend to each other via a 3D-Linguistic Bert to predict the target answers. To verify the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Layer Normalization · Weight Decay