Weakly Supervised Human-Object Interaction Detection in Video via   Contrastive Spatiotemporal Regions

Shuang Li; Yilun Du; Antonio Torralba; Josef Sivic; and Bryan Russell

arXiv:2110.03562·cs.CV·October 8, 2021

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, and Bryan Russell

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel weakly supervised learning approach for detecting human-object interactions in videos using contrastive spatiotemporal region association and a new curated dataset, improving over baseline methods.

Contribution

The work introduces a contrastive weakly supervised training loss and a semi-automatically curated dataset for human-object interaction detection in videos.

Findings

01

Improved performance over weakly supervised baselines.

02

Effective spatiotemporal region association for interaction detection.

03

A new dataset with 6.5k videos for this task.

Abstract

We introduce the task of weakly supervised learning for detecting human and object interactions in videos. Our task poses unique challenges as a system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object. To address these challenges, we introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage temporal continuity of the visual appearance of moving objects as a form of self-supervision. To train our model, we introduce a dataset comprising over 6.5k videos with human-object interaction annotations that have been semi-automatically curated from sentence captions associated with the videos. We demonstrate improved performance over weakly supervised baselines adapted to our task on our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShuangLI59/weakly-supervised-human-object-detection-video
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning