Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
Wenhan Wu, Zhishuai Guo, Chen Chen, Hongfei Xue, Aidong Lu

TL;DR
This paper introduces FS-VAE, a novel model that enhances zero-shot skeleton-based action recognition by integrating frequency decomposition and multi-level semantic alignment to better capture fine-grained action details.
Contribution
The paper proposes a Frequency-Semantic Enhanced Variational Autoencoder (FS-VAE) with frequency-based modules and calibrated cross-alignment loss for improved zero-shot action recognition.
Findings
Enhanced semantic features improve action differentiation.
Frequency decomposition boosts robustness in recognition.
Effective alignment reduces semantic ambiguity.
Abstract
Zero-shot skeleton-based action recognition aims to develop models capable of identifying actions beyond the categories encountered during training. Previous approaches have primarily focused on aligning visual and semantic representations but often overlooked the importance of fine-grained action patterns in the semantic space (e.g., the hand movements in drinking water and brushing teeth). To address these limitations, we propose a Frequency-Semantic Enhanced Variational Autoencoder (FS-VAE) to explore the skeleton semantic representation learning with frequency decomposition. FS-VAE consists of three key components: 1) a frequency-based enhancement module with high- and low-frequency adjustments to enrich the skeletal semantics learning and improve the robustness of zero-shot action recognition; 2) a semantic-based action description with multilevel alignment to capture both local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Domain Adaptation and Few-Shot Learning
