3Mformer: Multi-order Multi-mode Transformer for Skeletal Action   Recognition

Lei Wang; Piotr Koniusz

arXiv:2303.14474·cs.CV·March 28, 2023·1 cites

3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition

Lei Wang, Piotr Koniusz

PDF

Open Access

TL;DR

The paper introduces 3Mformer, a novel multi-order multi-mode transformer that models higher-order relationships in skeletal data for action recognition, outperforming existing GCN and transformer models.

Contribution

It proposes a hypergraph-based approach with a multi-order transformer architecture that captures complex joint dependencies for improved skeletal action recognition.

Findings

01

Achieves state-of-the-art accuracy on benchmark datasets.

02

Effectively models higher-order joint dependencies.

03

Outperforms GCN and transformer-based methods.

Abstract

Many skeletal action recognition models use GCNs to represent the human body by 3D body joints connected body parts. GCNs aggregate one- or few-hop graph neighbourhoods, and ignore the dependency between not linked body joints. We propose to form hypergraph to model hyper-edges between graph nodes (e.g., third- and fourth-order hyper-edges capture three and four nodes) which help capture higher-order motion patterns of groups of body joints. We split action sequences into temporal blocks, Higher-order Transformer (HoT) produces embeddings of each temporal block based on (i) the body joints, (ii) pairwise links of body joints and (iii) higher-order hyper-edges of skeleton body joints. We combine such HoT embeddings of hyper-edges of orders 1, ..., r by a novel Multi-order Multi-mode Transformer (3Mformer) with two modules whose order can be exchanged to achieve coupled-mode attention on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Anomaly Detection Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout · Dense Connections