Detecting Human-Object Interaction with Mixed Supervision
Suresh Kirthi Kumaraswamy (1), Miaojing Shi (2), Ewa Kijak (3) ((1), Univ Le Mans, CNRS, IRISA, (2) Kings College London, (3) Univ Rennes, Inria,, CNRS, IRISA)

TL;DR
This paper introduces a mixed-supervised approach for human-object interaction detection that effectively combines strong and weak supervision, utilizing a momentum-independent learning method and an HOI element swapping technique to enhance robustness and performance.
Contribution
It proposes a novel mixed-supervised HOI detection pipeline with a momentum-independent learning scheme and an HOI element swapping method to leverage both supervision types effectively.
Findings
Achieves performance comparable or superior to fully-supervised methods.
Outperforms existing weakly and fully-supervised methods on HICO-DET.
Demonstrates robustness through diverse negative sample synthesis.
Abstract
Human object interaction (HOI) detection is an important task in image understanding and reasoning. It is in a form of HOI triplet <human; verb; object>, requiring bounding boxes for human and object, and action between them for the task completion. In other words, this task requires strong supervision for training that is however hard to procure. A natural solution to overcome this is to pursue weakly-supervised learning, where we only know the presence of certain HOI triplets in images but their exact location is unknown. Most weakly-supervised learning methods do not make provision for leveraging data with strong supervision, when they are available; and indeed a na\"ive combination of this two paradigms in HOI detection fails to make contributions to each other. In this regard we propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
