Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition
Yao Liu, Gangfeng Cui, Jiahui Luo, Xiaojun Chang, Lina Yao

TL;DR
This paper introduces a novel point cloud-based two-stream transformer network that effectively recognizes two-person interactions by capturing spatial, appearance, and motion features, outperforming existing methods on large-scale datasets.
Contribution
The proposed model combines multi-level feature aggregation with a dynamic point transformer and introduces an efficient frame sampling method for improved interaction recognition.
Findings
Outperforms state-of-the-art on NTU RGB+D datasets
Effectively captures local and global interaction features
Demonstrates robustness with efficient frame sampling
Abstract
As a fundamental aspect of human life, two-person interactions contain meaningful information about people's activities, relationships, and social settings. Human action recognition serves as the foundation for many smart applications, with a strong focus on personal privacy. However, recognizing two-person interactions poses more challenges due to increased body occlusion and overlap compared to single-person actions. In this paper, we propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition. Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information. To achieve this, we introduce a designed frame selection method named Interval Frame Sampling (IFS), which efficiently samples frames from videos,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Human Motion and Animation
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout
