Multimodal feature fusion for CNN-based gait recognition: an empirical   comparison

Francisco Manuel Castro; Manuel Jes\'us Mar\'in-Jim\'enez; Nicol\'as; Guil; Nicol\'as P\'erez de la Blanca

arXiv:1806.07753·cs.CV·February 21, 2020

Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

Francisco Manuel Castro, Manuel Jes\'us Mar\'in-Jim\'enez, Nicol\'as, Guil, Nicol\'as P\'erez de la Blanca

PDF

TL;DR

This paper compares CNN architectures and multimodal fusion methods for gait recognition using raw pixels, optical flow, and depth maps, demonstrating that simple inputs and effective fusion can achieve state-of-the-art results.

Contribution

It provides a comprehensive empirical comparison of different CNN modalities and fusion strategies for gait recognition, highlighting the effectiveness of raw pixel inputs and multimodal fusion.

Findings

01

Raw pixel inputs are competitive with silhouette-based features.

02

Fusion of multiple modalities improves recognition accuracy.

03

Proper CNN architecture design is crucial for optimal performance.

Abstract

People identification in video based on the way they walk (i.e. gait) is a relevant task in computer vision using a non-invasive approach. Standard and current approaches typically derive gait signatures from sequences of binary energy maps of subjects extracted from images, but this process introduces a large amount of non-stationary noise, thus, conditioning their efficacy. In contrast, in this paper we focus on the raw pixels, or simple functions derived from them, letting advanced learning techniques to extract relevant features. Therefore, we present a comparative study of different Convolutional Neural Network (CNN) architectures by using three different modalities (i.e. gray pixels, optical flow channels and depth maps) on two widely-adopted and challenging datasets: TUM-GAID and CASIA-B. In addition, we perform a comparative study between different early and late fusion methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.