Lip-Siri: Contactless Open-Sentence Silent Speech with Wi-Fi Backscatter
Ye Tian, Haohua Du, Chao Gu, Junyang Zhang, Shanyue Wang, Hao Zhou, Jiahui Hou, and Xiang-Yang Li

TL;DR
Lip-Siri introduces a Wi-Fi backscatter-based silent speech interface capable of recognizing open-vocabulary sentences by decoding lip motions, offering a contactless, privacy-preserving, and energy-efficient communication method.
Contribution
This work is the first to enable open-vocabulary silent speech recognition using Wi-Fi backscatter and a novel lexicon-guided decoding approach.
Findings
Achieves 85.61% word prediction accuracy
Attains 36.87% word error rate in sentence recognition
Demonstrates reliable lip-motion extraction from Wi-Fi signals
Abstract
Silent speech interfaces (SSIs) enable silent interaction in noise-sensitive or privacy-sensitive settings. However, existing SSIs face practical deployment trade-offs among privacy, user experience, and energy consumption, and most remain limited to closed-set recognition over small, pre-defined vocabularies of words or sentences, which restricts real-world expressiveness. In this paper, we present Lip-Siri, to the best of our knowledge, the first Wi-Fi backscatter--based SSI that supports open-vocabulary sentence recognition via lexicon-guided subword decoding. Lip-Siri designs a frequency-shifted backscatter tag to isolate tag-modulated reflections and suppress interference from non-target motions, enabling reliable extraction of lip-motion traces from ubiquitous Wi-Fi signals. We then segment continuous traces into lip-motion units, cluster them, learn robust unit representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Speech Recognition and Synthesis
