Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations

Girish; Mohd Mujtaba Akhtar; Orchid Chetia Phukan; Drishti Singh; Swarup Ranjan Behera; Pailla Balakrishna Reddy; Arun Balaji Buduru; Rajesh Sharma

arXiv:2506.01157·eess.AS·June 3, 2025

Source Tracing of Synthetic Speech Systems Through Paralinguistic Pre-Trained Representations

Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Drishti Singh, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma

PDF

Open Access

TL;DR

This paper introduces TRIO, a novel framework that fuses paralinguistic and speaker recognition pre-trained speech representations to improve source tracing of synthetic speech systems, achieving state-of-the-art results.

Contribution

It proposes a new fusion method combining paralinguistic and speaker recognition models with a gated mechanism and CCA loss for better source attribution in synthetic speech.

Findings

01

TRIO outperforms individual SPTMs and baseline fusion methods.

02

Fusing TRILLsson and x-vector improves source tracing accuracy.

03

The approach sets new state-of-the-art in synthetic speech source tracing.

Abstract

In this work, we focus on source tracing of synthetic speech generation systems (STSGS). Each source embeds distinctive paralinguistic features--such as pitch, tone, rhythm, and intonation--into their synthesized speech, reflecting the underlying design of the generation model. While previous research has explored representations from speech pre-trained models (SPTMs), the use of representations from SPTM pre-trained for paralinguistic speech processing, which excel in paralinguistic tasks like synthetic speech detection, speech emotion recognition has not been investigated for STSGS. We hypothesize that representations from paralinguistic SPTM will be more effective due to its ability to capture source-specific paralinguistic cues attributing to its paralinguistic pre-training. Our comparative study of representations from various SOTA SPTMs, including paralinguistic, monolingual,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques

MethodsFocus