InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

Zifu Wan; Yaqi Xie; Ce Zhang; Zhiqiu Lin; Zihan Wang; Simon Stepputtis; Deva Ramanan; Katia Sycara

arXiv:2505.18291·cs.CV·May 28, 2025

InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

Zifu Wan, Yaqi Xie, Ce Zhang, Zhiqiu Lin, Zihan Wang, Simon Stepputtis, Deva Ramanan, Katia Sycara

PDF

1 Datasets 1 Video

TL;DR

This paper introduces InstructPart, a new benchmark with annotations and instructions for evaluating and improving models' ability to understand and segment object parts in real-world tasks, highlighting current challenges and potential improvements.

Contribution

The paper presents a novel benchmark dataset, InstructPart, with annotations and instructions for task-oriented part segmentation, and demonstrates a simple fine-tuning baseline that significantly improves performance.

Findings

01

Task-oriented part segmentation is challenging for current VLMs.

02

Fine-tuning with InstructPart dataset doubles performance.

03

The benchmark facilitates research in robotics, VR, and information retrieval.

Abstract

Large multimodal foundation models, particularly in the domains of language and vision, have significantly advanced various tasks, including robotics, autonomous driving, information retrieval, and grounding. However, many of these models perceive objects as indivisible, overlooking the components that constitute them. Understanding these components and their associated affordances provides valuable insights into an object's functionality, which is fundamental for performing a wide range of tasks. In this work, we introduce a novel real-world benchmark, InstructPart, comprising hand-labeled part segmentation annotations and task-oriented instructions to evaluate the performance of current models in understanding and executing part-level tasks within everyday contexts. Through our experiments, we demonstrate that task-oriented part segmentation remains a challenging problem, even for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zifuwan/InstructPart
dataset· 52 dl
52 dl

Videos

InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning· underline