PHONOS: PHOnetic Neutralization for Online Streaming Applications

Waris Quamer; Mu-Ruei Tseng; Ghady Nasrallah; Ricardo Gutierrez-Osuna

arXiv:2603.27001·eess.AS·March 31, 2026

PHONOS: PHOnetic Neutralization for Online Streaming Applications

Waris Quamer, Mu-Ruei Tseng, Ghady Nasrallah, Ricardo Gutierrez-Osuna

PDF

2 Datasets

TL;DR

PHONOS is a real-time streaming module that neutralizes non-native accents in speech to enhance speaker anonymization, using zero-shot voice conversion and alignment techniques.

Contribution

It introduces a novel streaming approach for accent neutralization that operates with low latency and improves speaker anonymization by reducing accent cues.

Findings

01

81% reduction in non-native accent confidence

02

Latency under 241 ms on single GPU

03

Reduced speaker linkability in embedding space

Abstract

Speaker anonymization (SA) systems modify timbre while leaving regional or non-native accents intact, which is problematic because accents can narrow the anonymity set. To address this issue, we present PHONOS, a streaming module for real-time SA that neutralizes non-native accent to sound native-like. Our approach pre-generates golden speaker utterances that preserve source timbre and rhythm but replace foreign segmentals with native ones using silence-aware DTW alignment and zero-shot voice conversion. These utterances supervise a causal accent translator that maps non-native content tokens to native equivalents with at most 40ms look-ahead, trained using joint cross-entropy and CTC losses. Our evaluations show an 81% reduction in non-native accent confidence, with listening-test ratings consistent with this shift, and reduced speaker linkability as accent-neutralized utterances move…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.