USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Luca Jiang-Tao Yu; Running Zhao; Sijie Ji; Edith C.H. Ngai; Chenshu Wu

arXiv:2410.22076·cs.SD·May 20, 2025

USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

Luca Jiang-Tao Yu, Running Zhao, Sijie Ji, Edith C.H. Ngai, Chenshu Wu

PDF

1 Repo

TL;DR

USpeech introduces a novel cross-modal ultrasound synthesis framework that enhances speech with minimal human effort by leveraging visual and audio data, overcoming data scarcity and heterogeneity issues.

Contribution

The paper presents a two-stage framework combining contrastive pre-training and ultrasound synthesis to improve ultrasound-based speech enhancement without extensive data collection.

Findings

01

Synthetic ultrasound data achieves comparable performance to physical data.

02

USpeech outperforms existing ultrasound speech enhancement methods.

03

Framework effectively overcomes data scarcity and heterogeneity challenges.

Abstract

Speech enhancement is crucial for ubiquitous human-computer interaction. Recently, ultrasound-based acoustic sensing has emerged as an attractive choice for speech enhancement because of its superior ubiquity and performance. However, due to inevitable interference from unexpected and unintended sources during audio-ultrasound data acquisition, existing solutions rely heavily on human effort for data collection and processing. This leads to significant data scarcity that limits the full potential of ultrasound-based speech enhancement. To address this, we propose USpeech, a cross-modal ultrasound synthesis framework for speech enhancement with minimal human effort. At its core is a two-stage framework that establishes the correspondence between visual and ultrasonic modalities by leveraging audio as a bridge. This approach overcomes challenges from the lack of paired video-ultrasound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiot-lab/USpeech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.