ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction   Detection in Videos

Meng-Jiun Chiou; Chun-Yu Liao; Li-Wei Wang; Roger Zimmermann and; Jiashi Feng

arXiv:2105.11731·cs.CV·June 25, 2021

ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos

Meng-Jiun Chiou, Chun-Yu Liao, Li-Wei Wang, Roger Zimmermann and, Jiashi Feng

PDF

1 Repo

TL;DR

This paper introduces ST-HOI, a novel spatial-temporal architecture for detecting human-object interactions in videos, emphasizing the importance of temporal context and proposing a new benchmark dataset.

Contribution

The paper presents a simple yet effective spatial-temporal model for video HOI detection and introduces the VidHOI benchmark dataset.

Findings

01

Naive temporal-aware models face feature-inconsistency issues.

02

ST-HOI effectively utilizes trajectories and spatial-temporal features.

03

Proposed method outperforms static image-based approaches on VidHOI.

Abstract

Detecting human-object interactions (HOI) is an important step toward a comprehensive visual understanding of machines. While detecting non-temporal HOIs (e.g., sitting on a chair) from static images is feasible, it is unlikely even for humans to guess temporal-related HOIs (e.g., opening/closing a door) from a single video frame, where the neighboring frames play an essential role. However, conventional HOI methods operating on only static images have been used to predict temporal-related interactions, which is essentially guessing without temporal contexts and may lead to sub-optimal performance. In this paper, we bridge this gap by detecting video-based HOIs with explicit temporal information. We first show that a naive temporal-aware variant of a common action detection baseline does not work on video-based HOIs due to a feature-inconsistency issue. We then propose a simple yet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coldmanck/VidHOI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.