PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal   Model

Amrin Kareem; Jean Lahoud; and Hisham Cholakkal

arXiv:2404.03836·cs.CV·April 8, 2024·1 cites

PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

Amrin Kareem, Jean Lahoud, and Hisham Cholakkal

PDF

Open Access 1 Repo

TL;DR

This paper introduces PARIS3D, a novel reasoning-based 3D part segmentation task that interprets implicit textual queries to segment 3D objects and generate explanations, supported by a large curated dataset and a capable model.

Contribution

It presents a new reasoning-based segmentation task, a large dataset, and a model that understands implicit queries and reasons about 3D object parts.

Findings

01

Achieves competitive segmentation performance with implicit queries.

02

Can generate natural language explanations for segmentation.

03

Demonstrates reasoning and world knowledge integration.

Abstract

Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object. To facilitate evaluation and benchmarking, we present a large 3D dataset comprising over 60k instructions paired with corresponding ground-truth part segmentation annotations specifically curated for reasoning-based 3D part segmentation. We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amrinkareem/paris3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Neural Network Applications · Handwritten Text Recognition Techniques