Towards Attribution of Generators and Emotional Manipulation in Cross-Lingual Synthetic Speech using Geometric Learning
Girish, Mohd Mujtaba Akhtar, Farhan Sheth, Muskaan Singh

TL;DR
This paper introduces MiCuNet, a novel geometric learning framework that combines semantic-prosodic cues and spectral dynamics to accurately trace emotional and manipulation sources in cross-lingual synthetic speech.
Contribution
It presents the first curvature-adaptive multitask framework for fine-grained attribution of emotion and manipulation in synthetic speech, integrating diverse auditory features.
Findings
MiCuNet outperforms conventional fusion strategies.
Effective in both English and Chinese subsets.
First to explore curvature-adaptive framework for this task.
Abstract
In this work, we address the problem of finegrained traceback of emotional and manipulation characteristics from synthetically manipulated speech. We hypothesize that combining semantic-prosodic cues captured by Speech Foundation Models (SFMs) with fine-grained spectral dynamics from auditory representations can enable more precise tracing of both emotion and manipulation source. To validate this hypothesis, we introduce MiCuNet, a novel multitask framework for fine-grained tracing of emotional and manipulation attributes in synthetically generated speech. Our approach integrates SFM embeddings with spectrogram-based auditory features through a mixed-curvature projection mechanism that spans Hyperbolic, Euclidean, and Spherical spaces guided by a learnable temporal gating mechanism. Our proposed method adopts a multitask learning setup to simultaneously predict original emotions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face recognition and analysis · Social Robot Interaction and HRI
