Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
Lian Remme, Kevin Tang

TL;DR
This paper introduces tabletop role-playing game audio as a new challenging dataset for speaker diarization systems, highlighting the difficulties posed by voice impersonation and character voice alterations.
Contribution
The paper creates a novel TTRPG audio dataset and evaluates existing diarization systems, revealing their limitations in this unique conversational setting.
Findings
Higher confusion rates for diarizers on TTRPG audio
Wespeaker underestimates speaker count in TTRPG recordings
TTRPG audio presents a new challenge for diarization systems
Abstract
This paper provides a proof of concept that audio of tabletop role-playing games (TTRPG) could serve as a challenge for diarization systems. TTRPGs are carried out mostly by conversation. Participants often alter their voices to indicate that they are talking as a fictional character. Audio processing systems are susceptible to voice conversion with or without technological assistance. TTRPG present a conversational phenomenon in which voice conversion is an inherent characteristic for an immersive gaming experience. This could make it more challenging for diarizers to pick the real speaker and determine that impersonating is just that. We present the creation of a small TTRPG audio dataset and compare it against the AMI and the ICSI corpus. The performance of two diarizers, pyannote.audio and wespeaker, were evaluated. We observed that TTRPGs' properties result in a higher confusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Games and Media
