Human-centric Relation Segmentation: Dataset and Solution
Si Liu, Zitian Wang, Yulu Gao, Lejian Ren, Yue Liao, Guanghui Ren, Bo, Li, Shuicheng Yan

TL;DR
This paper introduces human-centric relation segmentation (HRS), a fine-grained task combining relation detection and pixel-level segmentation, along with a new dataset and a real-time segmentation framework to improve robotic understanding of human-object interactions.
Contribution
It presents a new HRS task, a large annotated dataset (PIC), and a novel SMS framework that achieves real-time performance for fine-grained human-centric relation understanding.
Findings
SMS outperforms baselines in accuracy.
The dataset contains 17,122 images with detailed annotations.
Real-time inference at 36 FPS achieved.
Abstract
Vision and language understanding techniques have achieved remarkable progress, but currently it is still difficult to well handle problems involving very fine-grained details. For example, when the robot is told to "bring me the book in the girl's left hand", most existing methods would fail if the girl holds one book respectively in her left and right hand. In this work, we introduce a new task named human-centric relation segmentation (HRS), as a fine-grained case of HOI-det. HRS aims to predict the relations between the human and surrounding entities and identify the relation-correlated human parts, which are represented as pixel-level masks. For the above exemplar case, our HRS task produces results in the form of relation triplets <girl [left hand], hold, book> and exacts segmentation masks of the book, with which the robot can easily accomplish the grabbing task. Correspondingly,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
