CoHear: Conversation Enhancement via Multi-Earphone Collaboration
Lixing He, Yunqi Guo, Zhenyu Yan, Guoliang Xing

TL;DR
ClearSphere is a system that enhances speech clarity in noisy, crowded environments by enabling multi-earphone collaboration, using a novel network protocol and deep learning for real-time, conversation-level speech extraction.
Contribution
It introduces a conversation-driven network protocol and a robust speech extraction model that work together for real-time, multi-user conversation enhancement without infrastructure.
Findings
Achieves over 90% accuracy in group formation
Improves speech quality by up to 8.8 dB over baselines
Demonstrates real-time performance on mobile devices
Abstract
In crowded places such as conferences, background noise, overlapping voices, and lively interactions make it difficult to have clear conversations. This situation often worsens the phenomenon known as "cocktail party deafness." We present ClearSphere, the collaborative system that enhances speech at the conversation level with multi-earphones. Real-time conversation enhancement requires a holistic modeling of all the members in the conversation, and an effective way to extract the speech from the mixture. ClearSphere bridges the acoustic sensor system and state-of-the-art deep learning for target speech extraction by making two key contributions: 1) a conversation-driven network protocol, and 2) a robust target conversation extraction model. Our networking protocol enables mobile, infrastructure-free coordination among earphone devices. Our conversation extraction model can leverage the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Robotics and Automated Systems
