ConsNet: Learning Consistency Graph for Zero-Shot Human-Object   Interaction Detection

Ye Liu; Junsong Yuan; Chang Wen Chen

arXiv:2008.06254·cs.CV·March 29, 2022

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Ye Liu, Junsong Yuan, Chang Wen Chen

PDF

2 Repos

TL;DR

ConsNet introduces a graph-based approach that leverages multi-level consistencies among objects, actions, and interactions to improve zero-shot and supervised human-object interaction detection.

Contribution

The paper proposes ConsNet, a novel framework that encodes relations among HOI components into a graph and uses GATs to enhance detection, especially for unseen categories.

Findings

01

Outperforms state-of-the-art on V-COCO and HICO-DET datasets.

02

Effective in zero-shot HOI detection scenarios.

03

Utilizes visual and semantic features for improved recognition.

Abstract

We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of <human, action, object> in images. Most existing works treat HOIs as individual interaction categories, thus can not handle the problem of long-tail distribution and polysemy of action labels. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Leveraging the compositional and relational peculiarities of HOI labels, we propose ConsNet, a knowledge-aware framework that explicitly encodes the relations among objects, actions and interactions into an undirected graph called consistency graph, and exploits Graph Attention Networks (GATs) to propagate knowledge among HOI categories as well as their constituents. Our model takes visual features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.