Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy

Ke Xue; Rongfei Fan; Kai Li; Shanping Yu; Puning Zhao; Jianping An

arXiv:2602.00568·cs.SD·February 3, 2026

Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy

Ke Xue, Rongfei Fan, Kai Li, Shanping Yu, Puning Zhao, Jianping An

PDF

Open Access

TL;DR

DVPD is a lightweight speech enhancement model that exploits spectrograms as both visual textures and frequency representations, achieving state-of-the-art results with significantly fewer parameters and computational costs.

Contribution

The paper introduces DVPD, a novel dual-view diffusion model that leverages spectral compression and visual feature extraction for efficient speech enhancement.

Findings

01

Achieves state-of-the-art performance on benchmarks.

02

Uses only 35% of parameters compared to SOTA models.

03

Reduces inference MACs by 60%.

Abstract

Diffusion models have recently set new benchmarks in Speech Enhancement (SE). However, most existing score-based models treat speech spectrograms merely as generic 2D images, applying uniform processing that ignores the intrinsic structural sparsity of audio, which results in inefficient spectral representation and prohibitive computational complexity. To bridge this gap, we propose DVPD, an extremely lightweight Dual-View Predictive Diffusion model, which uniquely exploits the dual nature of spectrograms as both visual textures and physical frequency-domain representations across both training and inference stages. Specifically, during training, we optimize spectral utilization via the Frequency-Adaptive Non-uniform Compression (FANC) encoder, which preserves critical low-frequency harmonics while pruning high-frequency redundancies. Simultaneously, we introduce a Lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation