Object Activity Scene Description, Construction and Recognition
Hui Feng, Shanshan Wang, Shuzhi Sam Ge

TL;DR
This paper introduces a novel method for recognizing complex activity scenes by partitioning scenes into primitive actions, describing them with joint trajectories, and applying CNNs inspired by text classification techniques, demonstrating effective results.
Contribution
The paper proposes a new approach combining motion attention, primitive action partitioning, and CNN-based scene recognition inspired by text classification.
Findings
Effective scene recognition on human activity dataset
Primitive action partitioning improves recognition accuracy
CNN approach outperforms traditional methods
Abstract
Action recognition is a critical task for social robots to meaningfully engage with their environment. 3D human skeleton-based action recognition is an attractive research area in recent years. Although, the existing approaches are good at action recognition, it is a great challenge to recognize a group of actions in an activity scene. To tackle this problem, at first, we partition the scene into several primitive actions (PAs) based upon motion attention mechanism. Then, the primitive actions are described by the trajectory vectors of corresponding joints. After that, motivated by text classification based on word embedding, we employ convolution neural network (CNN) to recognize activity scenes by considering motion of joints as "word" of activity. The experimental results on the scenes of human activity dataset show the efficiency of the proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
