Rhythm Features for Speaker Identification
Nick Mehlman, Thomas Thebaud, Dani Byrd, Shri Narayanan

TL;DR
This paper explores the potential of rhythm features as a distinctive speaker identification cue, demonstrating their usefulness but also highlighting variability challenges in speech signals.
Contribution
It introduces the use of rhythm features for speaker recognition and evaluates their effectiveness with deep learning methods in a text-independent setting.
Findings
Rhythm features improve speaker recognition accuracy.
High intra-subject variability affects rhythm-based identification.
Rhythmic information complements traditional audio features.
Abstract
While deep learning models have demonstrated robust performance in speaker recognition tasks, they primarily rely on low-level audio features learned empirically from spectrograms or raw waveforms. However, prior work has indicated that idiosyncratic speaking styles heavily influence the temporal structure of linguistic units in speech signals (rhythm). This makes rhythm a strong yet largely overlooked candidate for a speech identity feature. In this paper, we test this hypothesis by applying deep learning methods to perform text-independent speaker identification from rhythm features. Our findings support the usefulness of rhythmic information for speaker recognition tasks but also suggest that high intra-subject variability in ad-hoc speech can degrade its effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
