EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction

Rui Feng; Yuang Chen; Yu Hu; Jun Du; Jiahong Yuan

arXiv:2508.08924·eess.AS·August 13, 2025

EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction

Rui Feng, Yuang Chen, Yu Hu, Jun Du, Jiahong Yuan

PDF

Open Access

TL;DR

EGGCodec is a novel neural framework that improves electroglottography signal reconstruction and F0 extraction accuracy by using multi-scale frequency and time-domain losses, streamlining training without performance loss.

Contribution

It introduces a multi-scale frequency-domain loss and a streamlined training process for EGG signal reconstruction and F0 extraction, outperforming existing methods.

Findings

01

Reduced MAE from 14.14 Hz to 13.69 Hz in F0 extraction.

02

Improved voicing decision error (VDE) by 38.2%.

03

Validated each component's contribution through ablation studies.

Abstract

This letter introduces EGGCodec, a robust neural Encodec framework engineered for electroglottography (EGG) signal reconstruction and F0 extraction. We propose a multi-scale frequency-domain loss function to capture the nuanced relationship between original and reconstructed EGG signals, complemented by a time-domain correlation loss to improve generalization and accuracy. Unlike conventional Encodec models that extract F0 directly from features, EGGCodec leverages reconstructed EGG signals, which more closely correspond to F0. By removing the conventional GAN discriminator, we streamline EGGCodec's training process without compromising efficiency, incurring only negligible performance degradation. Trained on a widely used EGG-inclusive dataset, extensive evaluations demonstrate that EGGCodec outperforms state-of-the-art F0 extraction schemes, reducing mean absolute error (MAE) from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Emotion and Mood Recognition