A stylometric analysis of speaker attribution from speech transcripts
Cristina Aggazzotti, Elizabeth Allyn Smith

TL;DR
This paper introduces StyloSpeaker, a stylometric method for speaker attribution from transcribed speech, demonstrating its effectiveness and interpretability compared to neural models across different transcript formats and topic controls.
Contribution
The paper presents a novel stylometric approach for speaker attribution from speech transcripts, extending authorship analysis techniques to spoken content and evaluating its performance against neural methods.
Findings
Higher attribution accuracy on normalized transcripts
Performance varies with topic control levels
Stylometric features offer interpretability over neural models
Abstract
Forensic scientists often need to identify an unknown speaker or writer in cases such as ransom calls, covert recordings, alleged suicide notes, or anonymous online communications, among many others. Speaker recognition in the speech domain usually examines phonetic or acoustic properties of a voice, and these methods can be accurate and robust under certain conditions. However, if a speaker disguises their voice or employs text-to-speech software, vocal properties may no longer be reliable, leaving only their linguistic content available for analysis. Authorship attribution methods traditionally use syntactic, semantic, and related linguistic information to identify writers of written text (authorship attribution). In this paper, we apply a content-based authorship approach to speech that has been transcribed into text, using what a speaker says to attribute speech to individuals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling
