Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
Mohd Mujtaba Akhtar, Girish, Farhan Sheth, Muskaan Singh

TL;DR
This paper introduces SIGNAL, a hybrid framework combining speech foundation models, graph neural networks, and open-set detection techniques to attribute synthetic speech sources and identify unseen generators, advancing forensic analysis.
Contribution
The paper presents the first unified approach integrating graph-based learning with open-set detection for synthetic speech attribution and detection.
Findings
SIGNAL improves attribution accuracy on diverse datasets.
Graph neural networks enhance reasoning over generator relationships.
Open-set detection effectively identifies unseen synthetic speech generators.
Abstract
We propose a unified framework for not only attributing synthetic speech to its source but also for detecting speech generated by synthesizers that were not encountered during training. This requires methods that move beyond simple detection to support both detailed forensic analysis and open-set generalization. To address this, we introduce SIGNAL, a hybrid framework that combines speech foundation models (SFMs) with graph-based modeling and open-set-aware inference. Our framework integrates Graph Neural Networks (GNNs) and a k-Nearest Neighbor (KNN) classifier, allowing it to capture meaningful relationships between utterances and recognize speech that doesn`t belong to any known generator. It constructs a query-conditioned graph over generator class prototypes, enabling the GNN to reason over relationships among candidate generators, while the KNN branch supports open-set detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Hate Speech and Cyberbullying Detection
