Speech Loudness in Broadcasting and Streaming
Matteo Torcoli, Mhd Modar Halimeh, Thomas Leitz, Yannik Grewe, Michael, Kratschmer, Bernhard Neugebauer, Adrian Murtaza, Harald Fuchs, Emanu\"el A., P. Habets

TL;DR
This paper enhances speech loudness measurement in broadcasting by using DNNs to isolate speech signals, defining critical passages based on loudness deviations, and demonstrating how these measures can improve content intelligibility and user experience.
Contribution
It introduces a DNN-based method for accurate speech loudness estimation and defines new measures for identifying critical passages affecting intelligibility.
Findings
DNNs improve speech loudness estimation accuracy.
Critical passages can be identified using local SLD and SBLD measures.
These measures can be controlled to enhance speech intelligibility.
Abstract
The introduction and regulation of loudness in broadcasting and streaming brought clear benefits to the audience, e.g., a level of uniformity across programs and channels. Yet, speech loudness is frequently reported as being too low in certain passages, which can hinder the full understanding and enjoyment of movies and TV programs. This paper proposes expanding the set of loudness-based measures typically used in the industry. We focus on speech loudness, and we show that, when clean speech is not available, Deep Neural Networks (DNNs) can be used to isolate the speech signal and so to accurately estimate speech loudness, providing a more precise estimate compared to speech-gated loudness. Moreover, we define critical passages, i.e., passages in which speech is likely to be hard to understand. Critical passages are defined based on the local Speech Loudness Deviation (SLD) and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
MethodsSparse Evolutionary Training · Focus
