Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum

Mohammed Salah Al-Radhi; Riad Larbi; M\'aty\'as Bartalis; G\'eza N\'emeth

arXiv:2601.14472·cs.SD·January 22, 2026

Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum

Mohammed Salah Al-Radhi, Riad Larbi, M\'aty\'as Bartalis, G\'eza N\'emeth

PDF

Open Access

TL;DR

This paper introduces a neural vocoder that uses prosody-guided harmonic attention and direct complex spectrum prediction to improve phase coherence, pitch accuracy, and naturalness in speech synthesis.

Contribution

It proposes a novel vocoder architecture that jointly models magnitude and phase with prosody guidance, enhancing speech quality over existing methods.

Findings

01

F0 RMSE reduced by 22%

02

Voiced/unvoiced error lowered by 18%

03

MOS scores improved by 0.15

Abstract

Neural vocoders are central to speech synthesis; despite their success, most still suffer from limited prosody modeling and inaccurate phase reconstruction. We propose a vocoder that introduces prosody-guided harmonic attention to enhance voiced segment encoding and directly predicts complex spectral components for waveform synthesis via inverse STFT. Unlike mel-spectrogram-based approaches, our design jointly models magnitude and phase, ensuring phase coherence and improved pitch fidelity. To further align with perceptual quality, we adopt a multi-objective training strategy that integrates adversarial, spectral, and phase-aware losses. Experiments on benchmark datasets demonstrate consistent gains over HiFi-GAN and AutoVocoder: F0 RMSE reduced by 22 percent, voiced/unvoiced error lowered by 18 percent, and MOS scores improved by 0.15. These results show that prosody-guided attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders