Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory
Ting Lei, Fabian Caba, Qingchao Chen, Hailin Jin, Yuxin Peng, Yang Liu

TL;DR
This paper introduces ADA-CM, an efficient adaptive HOI detection method that leverages pre-trained models and concept-guided memory to handle long-tailed data and improve performance with less training time.
Contribution
The paper proposes ADA-CM, a novel HOI detector that operates in training-free and fine-tuning modes, utilizing concept-guided memory and pre-trained models for improved efficiency and effectiveness.
Findings
Achieves competitive results on HICO-DET and V-COCO datasets.
Requires significantly less training time than state-of-the-art methods.
Operates effectively in both training-free and fine-tuning modes.
Abstract
Human Object Interaction (HOI) detection aims to localize and infer the relationships between a human and an object. Arguably, training supervised models for this task from scratch presents challenges due to the performance drop over rare classes and the high computational cost and time required to handle long-tailed distributions of HOIs in complex HOI scenes in realistic settings. This observation motivates us to design an HOI detector that can be trained even with long-tailed labeled data and can leverage existing knowledge from pre-trained models. Inspired by the powerful generalization ability of the large Vision-Language Models (VLM) on classification and retrieval tasks, we propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM). ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm. Its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsAdapter
