VibOmni: Towards Scalable Bone-conduction Speech Enhancement on Earables

Lixing He; Yunqi Guo; Haozheng Hou; Zhenyu Yan

arXiv:2512.02515·cs.SD·December 3, 2025

VibOmni: Towards Scalable Bone-conduction Speech Enhancement on Earables

Lixing He, Yunqi Guo, Haozheng Hou, Zhenyu Yan

PDF

Open Access

TL;DR

VibOmni is a novel multi-modal speech enhancement system for earables that uses bone-conduction vibrations and audio, improving speech quality and recognition in noisy environments with real-world validation.

Contribution

It introduces a lightweight deep neural network that fuses audio and vibration data, along with a novel data augmentation method for limited datasets, enabling scalable and adaptive speech enhancement.

Findings

01

Up to 21% PESQ improvement

02

26% SNR enhancement

03

40% WER reduction

Abstract

Earables, such as True Wireless Stereo earphones and VR/AR headsets, are increasingly popular, yet their compact design poses challenges for robust voice-related applications like telecommunication and voice assistant interactions in noisy environments. Existing speech enhancement systems, reliant solely on omnidirectional microphones, struggle with ambient noise like competing speakers. To address these issues, we propose VibOmni, a lightweight, end-to-end multi-modal speech enhancement system for earables that leverages bone-conducted vibrations captured by widely available Inertial Measurement Units (IMUs). VibOmni integrates a two-branch encoder-decoder deep neural network to fuse audio and vibration features. To overcome the scarcity of paired audio-vibration datasets, we introduce a novel data augmentation technique that models Bone Conduction Functions (BCFs) from limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques