TL;DR
VST-Pose is a novel WiFi-based human pose estimation framework that uses a spatiotemporal attention network with velocity modeling to achieve high accuracy and robustness in indoor environments.
Contribution
The paper introduces VST-Pose, a new deep learning model with a dual-stream spatiotemporal attention backbone and velocity modeling for improved WiFi-based pose estimation.
Findings
Achieves 92.2% accuracy on PCK@50 metric.
Outperforms existing methods by 8.3% in PCK@50.
Demonstrates robustness on public datasets.
Abstract
WiFi-based human pose estimation has emerged as a promising non-visual alternative approaches due to its pene-trability and privacy advantages. This paper presents VST-Pose, a novel deep learning framework for accurate and continuous pose estimation using WiFi channel state information. The proposed method introduces ViSTA-Former, a spatiotemporal attention backbone with dual-stream architecture that adopts a dual-stream architecture to separately capture temporal dependencies and structural relationships among body joints. To enhance sensitivity to subtle human motions, a velocity modeling branch is integrated into the framework, which learns short-term keypoint dis-placement patterns and improves fine-grained motion representation. We construct a 2D pose dataset specifically designed for smart home care scenarios and demonstrate that our method achieves 92.2% accuracy on the PCK@50…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
