Tonguescape: Exploring Language Models Understanding of Vowel   Articulation

Haruki Sakajo; Yusuke Sakai; Hidetaka Kamigaito; Taro Watanabe

arXiv:2501.17643·cs.CL·January 30, 2025

Tonguescape: Exploring Language Models Understanding of Vowel Articulation

Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates whether vision-based language models can understand vowel articulation by associating tongue positions with speech sounds, using datasets derived from MRI images and videos.

Contribution

It introduces a new approach to evaluate vision-language models' understanding of vowel articulation through visual datasets and analysis.

Findings

01

Models understand vowels better with reference examples

02

Models struggle to infer tongue positions without visual references

03

Potential for multimodal models to grasp speech articulation mechanisms

Abstract

Vowels are primarily characterized by tongue position. Humans have discovered these features of vowel articulation through their own experience and explicit objective observation such as using MRI. With this knowledge and our experience, we can explain and understand the relationship between tongue positions and vowels, and this knowledge is helpful for language learners to learn pronunciation. Since language models (LMs) are trained on a large amount of data that includes linguistic and medical fields, our preliminary studies indicate that an LM is able to explain the pronunciation mechanisms of vowels. However, it is unclear whether multi-modal LMs, such as vision LMs, align textual information with visual information. One question arises: do LMs associate real tongue positions with vowel articulation? In this study, we created video and image datasets from the existing real-time MRI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sj-h4/tonguescape-builder
noneOfficial

Videos

Tonguescape: Exploring Language Models Understanding of Vowel Articulation· underline

Taxonomy

TopicsPhonetics and Phonology Research · Linguistic Variation and Morphology · Linguistics and Cultural Studies

MethodsALIGN