End-to-End Human Object Interaction Detection with HOI Transformer

Cheng Zou; Bohan Wang; Yue Hu; Junqi Liu; Qian Wu; Yu Zhao; Boxun Li,; Chenguang Zhang; Chi Zhang; Yichen Wei; Jian Sun

arXiv:2103.04503·cs.CV·March 9, 2021·5 cites

End-to-End Human Object Interaction Detection with HOI Transformer

Cheng Zou, Bohan Wang, Yue Hu, Junqi Liu, Qian Wu, Yu Zhao, Boxun Li,, Chenguang Zhang, Chi Zhang, Yichen Wei, Jian Sun

PDF

Open Access 1 Repo

TL;DR

HOI Transformer introduces an end-to-end, simplified approach for human object interaction detection that directly predicts interactions from global image context, outperforming previous methods in accuracy.

Contribution

The paper presents HOI Transformer, a novel end-to-end model that eliminates hand-designed components and directly predicts HOI instances using global context reasoning.

Findings

01

Achieves 26.61% AP on HICO-DET

02

Achieves 52.9% AP_role on V-COCO

03

Outperforms previous methods in accuracy

Abstract

We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner. Current approaches either decouple HOI task into separated stages of object detection and interaction classification or introduce surrogate interaction problem. In contrast, our method, named HOI Transformer, streamlines the HOI pipeline by eliminating the need for many hand-designed components. HOI Transformer reasons about the relations of objects and humans from global image context and directly predicts HOI instances in parallel. A quintuple matching loss is introduced to force HOI predictions in a unified way. Our method is conceptually much simpler and demonstrates improved accuracy. Without bells and whistles, HOI Transformer achieves $26.61%$ $A P$ on HICO-DET and $52.9%$ $A P_{r o l e}$ on V-COCO, surpassing previous methods with the advantage of being much simpler. We hope…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bbepoch/HoiTransformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications

MethodsAbsolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer