# Speaker-independent classification of phonetic segments from raw   ultrasound in child speech

**Authors:** Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

arXiv: 1907.01413 · 2019-07-03

## TL;DR

This paper explores automatic classification of phonetic segments from raw ultrasound tongue images in child speech, focusing on generalization across different speakers and proposing methods to improve speaker-independent performance.

## Contribution

It introduces approaches for classifying ultrasound tongue images that improve generalization to unseen speakers, especially with minimal additional speaker information.

## Key findings

- Models underperform on unseen speakers without adaptation.
- Providing minimal speaker info improves generalization.
- Speaker-independent classification benefits from speaker-specific cues.

## Abstract

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production. UTI is increasingly being used for speech therapy, making it important to develop automatic methods to assist various time-consuming manual tasks currently performed by speech therapists. A key challenge is to generalize the automatic processing of ultrasound tongue images to previously unseen speakers. In this work, we investigate the classification of phonetic segments (tongue shapes) from raw ultrasound recordings under several training scenarios: speaker-dependent, multi-speaker, speaker-independent, and speaker-adapted. We observe that models underperform when applied to data from speakers not seen at training time. However, when provided with minimal additional speaker information, such as the mean ultrasound frame, the models generalize better to unseen speakers.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01413/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01413/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1907.01413/full.md

---
Source: https://tomesphere.com/paper/1907.01413