GaitMA: Pose-guided Multi-modal Feature Fusion for Gait Recognition

Fanxu Min; Shaoxiang Guo; Fan Hao; Junyu Dong

arXiv:2407.14812·cs.CV·July 23, 2024·1 cites

GaitMA: Pose-guided Multi-modal Feature Fusion for Gait Recognition

Fanxu Min, Shaoxiang Guo, Fan Hao, Junyu Dong

PDF

Open Access

TL;DR

GaitMA introduces a multi-modal gait recognition framework that combines silhouette and skeleton features using co-attention and mutual learning modules, achieving superior performance on multiple datasets.

Contribution

The paper proposes a novel multi-modal fusion approach with co-attention and mutual learning modules for robust gait recognition.

Findings

01

Outperforms existing methods on Gait3D, OU-MVLP, and CASIA-B datasets.

02

Effectively fuses silhouette and skeleton features for improved accuracy.

03

Demonstrates robustness against occlusions and semantic limitations.

Abstract

Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns. Existing appearance-based methods utilize CNN or Transformer to extract spatial and temporal features from silhouettes, while model-based methods employ GCN to focus on the special topological structure of skeleton points. However, the quality of silhouettes is limited by complex occlusions, and skeletons lack dense semantic features of the human body. To tackle these problems, we propose a novel gait recognition framework, dubbed Gait Multi-model Aggregation Network (GaitMA), which effectively combines two modalities to obtain a more robust and comprehensive gait representation for recognition. First, skeletons are represented by joint/limb-based heatmaps, and features from silhouettes and skeletons are respectively extracted using two CNN-based feature extractors. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGait Recognition and Analysis · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Focus · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention