Loud-loss: A Perceptually Motivated Loss Function for Speech Enhancement Based on Equal-Loudness Contours

Zixuan Li; Xueliang Zhang; Changjiang Zhao; Shuai Gao; Lei Miao; Zhipeng Yan; Ying Sun; Chong Zhu

arXiv:2511.05945·cs.SD·November 11, 2025

Loud-loss: A Perceptually Motivated Loss Function for Speech Enhancement Based on Equal-Loudness Contours

Zixuan Li, Xueliang Zhang, Changjiang Zhao, Shuai Gao, Lei Miao, Zhipeng Yan, Ying Sun, Chong Zhu

PDF

Open Access

TL;DR

This paper introduces a perceptually-weighted loss function for speech enhancement that uses equal-loudness contours to better align with human auditory perception, improving perceptual quality over traditional MSE.

Contribution

It proposes a novel psychoacoustically motivated loss function based on equal-loudness contours, enhancing speech enhancement models' perceptual performance.

Findings

01

Significant WB-PESQ score improvement from 2.17 to 2.93.

02

Loss function is model-agnostic and flexible.

03

Enhanced perceptual quality in speech enhancement.

Abstract

The mean squared error (MSE) is a ubiquitous loss function for speech enhancement, but its problem is that the error cannot reflect the auditory perception quality. This is because MSE causes models to over-emphasize low-frequency components which has high energy, leading to the inadequate modeling of perceptually important high-frequency information. To overcome this limitation, we propose a perceptually-weighted loss function grounded in psychoacoustic principles. Specifically, it leverages equal-loudness contours to assign frequency-dependent weights to the reconstruction error, thereby penalizing deviations in a way aligning with human auditory sensitivity. The proposed loss is model-agnostic and flexible, demonstrating strong generality. Experiments on the VoiceBank+DEMAND dataset show that replacing MSE with our loss in a GTCRN model elevates the WB-PESQ score from 2.17 to 2.93-a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques