Explicit Time-Frequency Dynamics for Skeleton-Based Gait Recognition
Seoyeon Ko, Yeojin Song, Egene Chung, Luca Quagliato, Taeyong Lee, Junhyug Noh

TL;DR
This paper introduces a wavelet-based time-frequency feature stream that enhances skeleton-based gait recognition by capturing dynamic cues, improving accuracy especially under appearance variations.
Contribution
It proposes a plug-and-play wavelet feature stream that augments existing skeleton backbones with explicit motion dynamics without architectural changes.
Findings
Consistent performance improvements across multiple skeleton backbones.
State-of-the-art results on CASIA-B dataset with the proposed method.
Significant gains under covariate shifts like carrying bags and wearing coats.
Abstract
Skeleton-based gait recognizers excel at modeling spatial configurations but often underuse explicit motion dynamics that are crucial under appearance changes. We introduce a plug-and-play Wavelet Feature Stream that augments any skeleton backbone with time-frequency dynamics of joint velocities. Concretely, per-joint velocity sequences are transformed by the continuous wavelet transform (CWT) into multi-scale scalograms, from which a lightweight multi-scale CNN learns discriminative dynamic cues. The resulting descriptor is fused with the backbone representation for classification, requiring no changes to the backbone architecture or additional supervision. Across CASIA-B, the proposed stream delivers consistent gains on strong skeleton backbones (e.g., GaitMixer, GaitFormer, GaitGraph) and establishes a new skeleton-based state of the art when attached to GaitMixer. The improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
