EGGCodec: A Robust Neural Encodec Framework for EGG Reconstruction and F0 Extraction
Rui Feng, Yuang Chen, Yu Hu, Jun Du, Jiahong Yuan

TL;DR
EGGCodec is a novel neural framework that improves electroglottography signal reconstruction and F0 extraction accuracy by using multi-scale frequency and time-domain losses, streamlining training without performance loss.
Contribution
It introduces a multi-scale frequency-domain loss and a streamlined training process for EGG signal reconstruction and F0 extraction, outperforming existing methods.
Findings
Reduced MAE from 14.14 Hz to 13.69 Hz in F0 extraction.
Improved voicing decision error (VDE) by 38.2%.
Validated each component's contribution through ablation studies.
Abstract
This letter introduces EGGCodec, a robust neural Encodec framework engineered for electroglottography (EGG) signal reconstruction and F0 extraction. We propose a multi-scale frequency-domain loss function to capture the nuanced relationship between original and reconstructed EGG signals, complemented by a time-domain correlation loss to improve generalization and accuracy. Unlike conventional Encodec models that extract F0 directly from features, EGGCodec leverages reconstructed EGG signals, which more closely correspond to F0. By removing the conventional GAN discriminator, we streamline EGGCodec's training process without compromising efficiency, incurring only negligible performance degradation. Trained on a widely used EGG-inclusive dataset, extensive evaluations demonstrate that EGGCodec outperforms state-of-the-art F0 extraction schemes, reducing mean absolute error (MAE) from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Emotion and Mood Recognition
