Two-stream Multi-level Dynamic Point Transformer for Two-person   Interaction Recognition

Yao Liu; Gangfeng Cui; Jiahui Luo; Xiaojun Chang; Lina Yao

arXiv:2307.11973·cs.CV·May 15, 2024

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

Yao Liu, Gangfeng Cui, Jiahui Luo, Xiaojun Chang, Lina Yao

PDF

Open Access

TL;DR

This paper introduces a novel point cloud-based two-stream transformer network that effectively recognizes two-person interactions by capturing spatial, appearance, and motion features, outperforming existing methods on large-scale datasets.

Contribution

The proposed model combines multi-level feature aggregation with a dynamic point transformer and introduces an efficient frame sampling method for improved interaction recognition.

Findings

01

Outperforms state-of-the-art on NTU RGB+D datasets

02

Effectively captures local and global interaction features

03

Demonstrates robustness with efficient frame sampling

Abstract

As a fundamental aspect of human life, two-person interactions contain meaningful information about people's activities, relationships, and social settings. Human action recognition serves as the foundation for many smart applications, with a strong focus on personal privacy. However, recognizing two-person interactions poses more challenges due to increased body occlusion and overlap compared to single-person actions. In this paper, we propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition. Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information. To achieve this, we introduce a designed frame selection method named Interval Frame Sampling (IFS), which efficiently samples frames from videos,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Human Motion and Animation

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout