A Grammatical Compositional Model for Video Action Detection

Zhijun Zhang; Xu Zou; Jiahuan Zhou; Sheng Zhong; Ying Wu

arXiv:2310.02887·cs.CV·October 5, 2023

A Grammatical Compositional Model for Video Action Detection

Zhijun Zhang, Xu Zou, Jiahuan Zhou, Sheng Zhong, Ying Wu

PDF

Open Access

TL;DR

This paper introduces a novel grammatical compositional model for video action detection that leverages hierarchical structures and neural networks to better understand complex human actions and interactions in videos.

Contribution

It proposes a new Grammatical Compositional Model based on And-Or graphs that combines grammar structures with deep features for improved action detection.

Findings

01

Outperforms existing methods on AVA and Something-Else datasets.

02

Enhances interpretability through an inference parsing procedure.

03

Can be integrated into neural networks for end-to-end training.

Abstract

Analysis of human actions in videos demands understanding complex human dynamics, as well as the interaction between actors and context. However, these interaction relationships usually exhibit large intra-class variations from diverse human poses or object manipulations, and fine-grained inter-class differences between similar actions. Thus the performance of existing methods is severely limited. Motivated by the observation that interactive actions can be decomposed into actor dynamics and participating objects or humans, we propose to investigate the composite property of them. In this paper, we present a novel Grammatical Compositional Model (GCM) for action detection based on typical And-Or graphs. Our model exploits the intrinsic structures and latent relationships of actions in a hierarchical manner to harness both the compositionality of grammar models and the capability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications