Rhythm Features for Speaker Identification

Nick Mehlman; Thomas Thebaud; Dani Byrd; Shri Narayanan

arXiv:2506.06834·eess.AS·June 10, 2025

Rhythm Features for Speaker Identification

Nick Mehlman, Thomas Thebaud, Dani Byrd, Shri Narayanan

PDF

Open Access

TL;DR

This paper explores the potential of rhythm features as a distinctive speaker identification cue, demonstrating their usefulness but also highlighting variability challenges in speech signals.

Contribution

It introduces the use of rhythm features for speaker recognition and evaluates their effectiveness with deep learning methods in a text-independent setting.

Findings

01

Rhythm features improve speaker recognition accuracy.

02

High intra-subject variability affects rhythm-based identification.

03

Rhythmic information complements traditional audio features.

Abstract

While deep learning models have demonstrated robust performance in speaker recognition tasks, they primarily rely on low-level audio features learned empirically from spectrograms or raw waveforms. However, prior work has indicated that idiosyncratic speaking styles heavily influence the temporal structure of linguistic units in speech signals (rhythm). This makes rhythm a strong yet largely overlooked candidate for a speech identity feature. In this paper, we test this hypothesis by applying deep learning methods to perform text-independent speaker identification from rhythm features. Our findings support the usefulness of rhythmic information for speaker recognition tasks but also suggest that high intra-subject variability in ad-hoc speech can degrade its effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing