VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation

Xinyu Zhang; Zhonghao Ye; Jingwei Zhang; Xiang Tian; Zhisheng Liang; and Shipeng Yu

arXiv:2507.09672·cs.CV·July 15, 2025

VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation

Xinyu Zhang, Zhonghao Ye, Jingwei Zhang, Xiang Tian, Zhisheng Liang, and Shipeng Yu

PDF

1 Repo

TL;DR

VST-Pose is a novel WiFi-based human pose estimation framework that uses a spatiotemporal attention network with velocity modeling to achieve high accuracy and robustness in indoor environments.

Contribution

The paper introduces VST-Pose, a new deep learning model with a dual-stream spatiotemporal attention backbone and velocity modeling for improved WiFi-based pose estimation.

Findings

01

Achieves 92.2% accuracy on PCK@50 metric.

02

Outperforms existing methods by 8.3% in PCK@50.

03

Demonstrates robustness on public datasets.

Abstract

WiFi-based human pose estimation has emerged as a promising non-visual alternative approaches due to its pene-trability and privacy advantages. This paper presents VST-Pose, a novel deep learning framework for accurate and continuous pose estimation using WiFi channel state information. The proposed method introduces ViSTA-Former, a spatiotemporal attention backbone with dual-stream architecture that adopts a dual-stream architecture to separately capture temporal dependencies and structural relationships among body joints. To enhance sensitivity to subtle human motions, a velocity modeling branch is integrated into the framework, which learns short-term keypoint dis-placement patterns and improves fine-grained motion representation. We construct a 2D pose dataset specifically designed for smart home care scenarios and demonstrate that our method achieves 92.2% accuracy on the PCK@50…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carmenqing/vst-pose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.