WirelessSenseLLM: Zero-Shot Human Activity Understanding by Bridging Wireless Signals and Human Language
Mahmuda Keya, Sneh Pillai, Jiawei Yuan, Kai Zeng, and Long Jiao

TL;DR
WirelessSenseLLM introduces a novel framework that uses large language models to interpret human motion from unsegmented Wi-Fi signals in a zero-shot setting, enabling natural language descriptions and reasoning.
Contribution
It bridges the gap between wireless signals and language, enabling zero-shot human activity understanding without segmented training data using a new modality adapter and cross-modal projection.
Findings
Achieves 92% accuracy in zero-shot action recognition
Improves language reasoning and factual accuracy by 30% and 15% respectively
Enhances multi-person motion explanation with 12.33% better performance
Abstract
There is growing interest in enabling wireless sensing systems to interpret human motion from unsegmented wireless signals; however, existing CSI-based applications rely heavily on accurate signal segmentation and predefined action labels, limiting their applicability in zero-shot scenarios. We present WirelessSenseLLM, a language-driven framework that leverages large language models (LLMs) to enable zero-shot human motion understanding from unsegmented Wi-Fi Channel State Information (CSI). To bridge the modality gap between time-series CSI and discrete language representations, we introduce a CSI-to-Language Adapter and a cross-modal projection mechanism that maps CSI features into a language-aligned semantic space. This design enables the generation of fine-grained natural language descriptions of sequential and overlapping human motions, supporting downstream reasoning without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
