Human-centric Relation Segmentation: Dataset and Solution

Si Liu; Zitian Wang; Yulu Gao; Lejian Ren; Yue Liao; Guanghui Ren; Bo; Li; Shuicheng Yan

arXiv:2105.11168·cs.CV·May 26, 2021·1 cites

Human-centric Relation Segmentation: Dataset and Solution

Si Liu, Zitian Wang, Yulu Gao, Lejian Ren, Yue Liao, Guanghui Ren, Bo, Li, Shuicheng Yan

PDF

Open Access

TL;DR

This paper introduces human-centric relation segmentation (HRS), a fine-grained task combining relation detection and pixel-level segmentation, along with a new dataset and a real-time segmentation framework to improve robotic understanding of human-object interactions.

Contribution

It presents a new HRS task, a large annotated dataset (PIC), and a novel SMS framework that achieves real-time performance for fine-grained human-centric relation understanding.

Findings

01

SMS outperforms baselines in accuracy.

02

The dataset contains 17,122 images with detailed annotations.

03

Real-time inference at 36 FPS achieved.

Abstract

Vision and language understanding techniques have achieved remarkable progress, but currently it is still difficult to well handle problems involving very fine-grained details. For example, when the robot is told to "bring me the book in the girl's left hand", most existing methods would fail if the girl holds one book respectively in her left and right hand. In this work, we introduce a new task named human-centric relation segmentation (HRS), as a fine-grained case of HOI-det. HRS aims to predict the relations between the human and surrounding entities and identify the relation-correlated human parts, which are represented as pixel-level masks. For the above exemplar case, our HRS task produces results in the form of relation triplets <girl [left hand], hold, book> and exacts segmentation masks of the book, with which the robot can easily accomplish the grabbing task. Correspondingly,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition