Automatic audiovisual synchronisation for ultrasound tongue imaging

Aciel Eshky; Joanne Cleland; Manuel Sam Ribeiro; Eleanor Sugden; Korin; Richmond; Steve Renals

arXiv:2105.15162·eess.AS·June 1, 2021

Automatic audiovisual synchronisation for ultrasound tongue imaging

Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin, Richmond, Steve Renals

PDF

TL;DR

This paper presents a self-supervised neural network approach for automatic synchronization of ultrasound tongue imaging and speech audio, achieving over 92% accuracy and outperforming hardware synchronization in clinical datasets.

Contribution

The study introduces a neural network-based method for post-hoc audiovisual synchronization, effective across diverse domains and unreliable hardware conditions.

Findings

01

Achieved >92.4% accuracy on in-domain data

02

Users preferred model output over hardware synchronization 79.3% of the time

03

Demonstrated generalization to new clinical datasets

Abstract

Ultrasound tongue imaging is used to visualise the intra-oral articulators during speech production. It is utilised in a range of applications, including speech and language therapy and phonetics research. Ultrasound and speech audio are recorded simultaneously, and in order to correctly use this data, the two modalities should be correctly synchronised. Synchronisation is achieved using specialised hardware at recording time, but this approach can fail in practice resulting in data of limited usability. In this paper, we address the problem of automatically synchronising ultrasound and audio after data collection. We first investigate the tolerance of expert ultrasound users to synchronisation errors in order to find the thresholds for error detection. We use these thresholds to define accuracy scoring boundaries for evaluating our system. We then describe our approach for automatic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.