Human Silhouette and Skeleton Video Synthesis through Wi-Fi signals
Danilo Avola, Marco Cascio, Luigi Cinque, Alessio Fagioli, Gian, Luca Foresti

TL;DR
This paper introduces a novel neural network that synthesizes human silhouette and skeleton videos solely from Wi-Fi signals, enabling visual data generation without cameras through cross-modality learning.
Contribution
It proposes a two-branch generative model with a teacher-student design that maps Wi-Fi radio data to visual features, replacing visual data with Wi-Fi signals.
Findings
Effective synthesis of silhouette videos from Wi-Fi signals
High-quality skeleton video generation demonstrated
Cross-modality supervision improves visual feature inference
Abstract
The increasing availability of wireless access points (APs) is leading towards human sensing applications based on Wi-Fi signals as support or alternative tools to the widespread visual sensors, where the signals enable to address well-known vision-related problems such as illumination changes or occlusions. Indeed, using image synthesis techniques to translate radio frequencies to the visible spectrum can become essential to obtain otherwise unavailable visual data. This domain-to-domain translation is feasible because both objects and people affect electromagnetic waves, causing radio and optical frequencies variations. In literature, models capable of inferring radio-to-visual features mappings have gained momentum in the last few years since frequency changes can be observed in the radio domain through the channel state information (CSI) of Wi-Fi APs, enabling signal-based feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
