MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

Yang Jiao; Shaoxiang Chen; Zequn Jie; Jingjing Chen; Lin Ma; Yu-Gang; Jiang

arXiv:2203.05203·cs.CV·July 21, 2022·1 cites

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang, Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MORE, a novel multi-order relation mining model that enhances 3D dense captioning by capturing complex inter-object relations in point clouds, leading to more descriptive scene captions.

Contribution

The paper proposes a new model, MORE, that encodes multi-order relations in 3D scenes using a novel graph convolution and triplet attention, improving caption quality.

Findings

01

Outperforms current state-of-the-art on Scan2Cap dataset.

02

Effectively captures complex inter-object relations.

03

Enhances caption descriptiveness and accuracy.

Abstract

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart. However, it is also more challenging due to the higher complexity and wider variety of inter-object relations contained in point clouds. Existing methods only treat such relations as by-products of object feature learning in graphs without specifically encoding them, which leads to sub-optimal results. In this paper, aiming at improving 3D dense captioning via capturing and utilizing the complex relations in the 3D scene, we propose MORE, a Multi-Order RElation mining model, to support generating more descriptive and comprehensive captions. Technically, our MORE encodes object relations in a progressive manner since complex relations can be deduced from a limited number of basic ones. We first devise a novel Spatial Layout Graph Convolution (SLGC), which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SxJyJay/MORE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications

MethodsTriplet Attention · Convolution