Neural-Logic Human-Object Interaction Detection
Liulei Li, Jianan Wei, Wenguan Wang, Yi Yang

TL;DR
This paper introduces L OGIC HOI, a neural-logic reasoning-based Transformer model for human-object interaction detection that improves performance and zero-shot generalization by reasoning over interaction triplets and incorporating affordances and proxemics.
Contribution
It proposes a novel Transformer modification that enables reasoning over human-action-object triplets guided by logical properties, enhancing HOI detection and zero-shot capabilities.
Findings
Significant performance improvements on V-COCO and HICO-DET datasets.
Enhanced zero-shot generalization in HOI detection.
Effective incorporation of affordances and proxemics in reasoning process.
Abstract
The interaction decoder utilized in prevalent Transformer-based HOI detectors typically accepts pre-composed human-object pairs as inputs. Though achieving remarkable performance, such paradigm lacks feasibility and cannot explore novel combinations over entities during decoding. We present L OGIC HOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between entities. Specifically, we modify the self-attention mechanism in vanilla Transformer, enabling it to reason over the <human, action, object> triplet and constitute novel interactions. Meanwhile, such reasoning process is guided by two crucial properties for understanding HOI: affordances (the potential actions an object can facilitate) and proxemics (the spatial relations between humans and objects). We formulate these two properties in first-order logic and ground them into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling
MethodsAttention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing
