Weakly-supervised HOI Detection via Prior-guided Bi-level Representation   Learning

Bo Wan; Yongfei Liu; Desen Zhou; Tinne Tuytelaars; Xuming He

arXiv:2303.01313·cs.CV·March 3, 2023·5 cites

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He

PDF

Open Access 1 Video

TL;DR

This paper introduces a CLIP-guided weakly-supervised HOI detection method that leverages prior knowledge and a self-taught mechanism to improve human-object interaction recognition from image-level annotations.

Contribution

It develops a novel CLIP-guided HOI representation and a self-taught pruning strategy to enhance weakly-supervised HOI detection performance.

Findings

01

Outperforms previous methods on HICO-DET and V-COCO datasets

02

Effectively incorporates prior knowledge at image and instance levels

03

Demonstrates significant improvement in weakly-supervised HOI detection

Abstract

Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks. One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only. This is inherently challenging due to ambiguous human-object associations, large search space of detecting HOIs and highly noisy training signal. A promising strategy to address those challenges is to exploit knowledge from large-scale pretrained models (e.g., CLIP), but a direct knowledge distillation strategy~\citep{liao2022gen} does not perform well on the weakly-supervised setting. In contrast, we develop a CLIP-guided HOI representation capable of incorporating the prior knowledge at both image level and HOI instance level, and adopt a self-taught mechanism to prune incorrect human-object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsKnowledge Distillation