Human-like Linguistic Biases in Neural Speech Models: Phonetic   Categorization and Phonotactic Constraints in Wav2Vec2.0

Marianne de Heer Kloots; Willem Zuidema

arXiv:2407.03005·cs.CL·July 4, 2024

Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0

Marianne de Heer Kloots, Willem Zuidema

PDF

1 Repo

TL;DR

This study investigates how Wav2Vec2.0 neural speech models exhibit human-like biases in phonotactic constraints, revealing early-layer processing of phonological information and the influence of fine-tuning.

Contribution

It demonstrates that neural speech models encode phonotactic biases similar to humans and localizes this knowledge to early Transformer layers using controlled stimuli.

Findings

01

Wav2Vec2.0 shows bias towards permissible phonotactic categories.

02

Bias emerges in early Transformer layers.

03

Fine-tuning amplifies the phonotactic bias.

Abstract

What do deep neural speech models know about phonology? Existing work has examined the encoding of individual linguistic units such as phonemes in these models. Here we investigate interactions between units. Inspired by classic experiments on human speech perception, we study how Wav2Vec2 resolves phonotactic constraints. We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts where only /l/, only /r/, or neither occur in English. Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds. Using simple measures to analyze model internals on the level of individual stimuli, we find that this bias emerges in early layers of the model's Transformer module. This effect is amplified by ASR finetuning but also present in fully self-supervised models. Our approach demonstrates how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mdhk/phonotactic-sensitivity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Dense Connections