MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism

Yanyi Qu; Haoyang Ma; and Wenhui Xiong

arXiv:2505.22555·cs.CV·August 14, 2025

MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism

Yanyi Qu, Haoyang Ma, and Wenhui Xiong

PDF

Open Access

TL;DR

MultiFormer is a novel wireless sensing system that leverages Transformer-based feature extraction and multi-stage fusion to improve multi-person human pose estimation accuracy from CSI data, especially for high-mobility keypoints.

Contribution

The paper introduces MultiFormer, combining a Transformer-based dual-token feature extractor with a multi-stage fusion network for enhanced CSI-based pose estimation.

Findings

01

Achieves higher accuracy than state-of-the-art methods on public and self-collected datasets.

02

Effectively models inter-subcarrier correlations and temporal dependencies in CSI.

03

Improves estimation of high-mobility keypoints like wrists and elbows.

Abstract

Human pose estimation based on Channel State Information (CSI) has emerged as a promising approach for non-intrusive and precise human activity monitoring, yet faces challenges including accurate multi-person pose recognition and effective CSI feature learning. This paper presents MultiFormer, a wireless sensing system that accurately estimates human pose through CSI. The proposed system adopts a Transformer based time-frequency dual-token feature extractor with multi-head self-attention. This feature extractor is able to model inter-subcarrier correlations and temporal dependencies of the CSI. The extracted CSI features and the pose probability heatmaps are then fused by Multi-Stage Feature Fusion Network (MSFN) to enforce the anatomical constraints. Extensive experiments conducted on on the public MM-Fi dataset and our self-collected dataset show that the MultiFormer achieves higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Context-Aware Activity Recognition Systems

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing