Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

TL;DR
This paper introduces a psychoacoustic calibration method for neural audio codecs, enabling smaller models to produce perceptually similar audio quality to larger models and traditional codecs by incorporating human hearing thresholds into the loss function.
Contribution
It presents a novel psychoacoustic loss function that improves neural audio coding efficiency and perceptual quality with reduced model complexity.
Findings
Outperforms larger baseline neural codecs in quality and bitrate efficiency.
A lightweight neural codec achieves near-transparent quality at 112 kbps.
The method reduces model size by over 50% while maintaining high perceptual fidelity.
Abstract
Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
