[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

Kwanghee Choi; Eunjung Yeo; Cheol Jun Cho; David Harwath; David R. Mortensen

arXiv:2602.18899·eess.AS·April 15, 2026

[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

Kwanghee Choi, Eunjung Yeo, Cheol Jun Cho, David Harwath, David R. Mortensen

PDF

1 Repo

TL;DR

This paper reveals that self-supervised speech models encode phonological features as linear, compositional vectors, enabling phonological vector arithmetic across 96 languages.

Contribution

It demonstrates that S3Ms encode phonological information in interpretable, linear vectors, and introduces the concept of phonological vector arithmetic in speech representations.

Findings

01

Linear directions in model space correspond to phonological features.

02

Scale of phonological vectors correlates with acoustic realization.

03

Adding and scaling vectors produces phonological continuums.

Abstract

Self-supervised speech models (S3Ms) are known to encode rich phonetic information, yet how this information is structured remains underexplored. We conduct a comprehensive study across 96 languages to analyze the underlying structure of S3M representations, with particular attention to phonological vectors. We first show that there exist linear directions within the model's representation space that correspond to phonological features. We further demonstrate that the scale of these phonological vectors correlate to the degree of acoustic realization of their corresponding phonological features in a continuous manner. For example, the difference between [d] and [t] yields a voicing vector: adding this vector to [p] produces [b], while scaling it results in a continuum of voicing. Together, these findings indicate that S3Ms encode speech using phonologically interpretable and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juice500ml/phonetic-arithmetic
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.