ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a   Mapping from Parts Detected in Multiple Views to Sentences

Zhizhong Han; Chao Chen; Yu-Shen Liu; Matthias Zwicker

arXiv:1908.00120·cs.CV·August 2, 2019·1 cites

ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences

Zhizhong Han, Chao Chen, Yu-Shen Liu, Matthias Zwicker

PDF

Open Access

TL;DR

ShapeCaptioner is a novel generative network that improves 3D shape captioning by learning to map part detections from multiple views to detailed descriptive sentences, surpassing previous methods.

Contribution

It introduces a new approach that learns part detection knowledge from 3D segmentations and transfers it to enhance caption generation for 3D shapes.

Findings

01

Outperforms previous 3D shape captioning methods

02

Learns detailed part-level features for better descriptions

03

Uses a novel part class specific aggregation technique

Abstract

3D shape captioning is a challenging application in 3D shape understanding. Captions from recent multi-view based methods reveal that they cannot capture part-level characteristics of 3D shapes. This leads to a lack of detailed part-level description in captions, which human tend to focus on. To resolve this issue, we propose ShapeCaptioner, a generative caption network, to perform 3D shape captioning from semantic parts detected in multiple views. Our novelty lies in learning the knowledge of part detection in multiple views from 3D shape segmentations and transferring this knowledge to facilitate learning the mapping from 3D shapes to sentences. Specifically, ShapeCaptioner aggregates the parts detected in multiple colored views using our novel part class specific aggregation to represent a 3D shape, and then, employs a sequence to sequence model to generate the caption. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization