MaskClip: Detachable Clip-on Piezoelectric Sensing of Mask Surface Vibrations for Real-time Noise-Robust Speech Input
Hirotaka Hiraki, Jun Rekimoto

TL;DR
MaskClip is a novel piezoelectric sensing device that attaches to masks to capture the wearer's voice by detecting mask surface vibrations, effectively filtering ambient noise for clearer speech input in noisy environments.
Contribution
We introduce MaskClip, a detachable clip-on device that uses piezoelectric sensors to selectively sense mask surface vibrations, improving speech recognition accuracy in noisy settings.
Findings
Achieved a low Character Error Rate of 6.1% in noisy environments.
High user satisfaction in subjective evaluations.
Superior noise robustness compared to conventional microphones.
Abstract
Masks are essential in medical settings and during infectious outbreaks but significantly impair speech communication, especially in environments with background noise. Existing solutions often require substantial computational resources or compromise hygiene and comfort. We propose a novel sensing approach that captures only the wearer's voice by detecting mask surface vibrations using a piezoelectric sensor. Our developed device, MaskClip, employs a stainless steel clip with an optimally positioned piezoelectric sensor to selectively capture speech vibrations while inherently filtering out ambient noise. Evaluation experiments demonstrated superior performance with a low Character Error Rate of 6.1\% in noisy environments compared to conventional microphones. Subjective evaluations by 102 participants also showed high satisfaction scores. This approach shows promise for applications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech and dialogue systems
