Speech Loudness in Broadcasting and Streaming

Matteo Torcoli; Mhd Modar Halimeh; Thomas Leitz; Yannik Grewe; Michael; Kratschmer; Bernhard Neugebauer; Adrian Murtaza; Harald Fuchs; Emanu\"el A.; P. Habets

arXiv:2405.17364·eess.AS·May 28, 2024

Speech Loudness in Broadcasting and Streaming

Matteo Torcoli, Mhd Modar Halimeh, Thomas Leitz, Yannik Grewe, Michael, Kratschmer, Bernhard Neugebauer, Adrian Murtaza, Harald Fuchs, Emanu\"el A., P. Habets

PDF

Open Access

TL;DR

This paper enhances speech loudness measurement in broadcasting by using DNNs to isolate speech signals, defining critical passages based on loudness deviations, and demonstrating how these measures can improve content intelligibility and user experience.

Contribution

It introduces a DNN-based method for accurate speech loudness estimation and defines new measures for identifying critical passages affecting intelligibility.

Findings

01

DNNs improve speech loudness estimation accuracy.

02

Critical passages can be identified using local SLD and SBLD measures.

03

These measures can be controlled to enhance speech intelligibility.

Abstract

The introduction and regulation of loudness in broadcasting and streaming brought clear benefits to the audience, e.g., a level of uniformity across programs and channels. Yet, speech loudness is frequently reported as being too low in certain passages, which can hinder the full understanding and enjoyment of movies and TV programs. This paper proposes expanding the set of loudness-based measures typically used in the industry. We focus on speech loudness, and we show that, when clean speech is not available, Deep Neural Networks (DNNs) can be used to isolate the speech signal and so to accurately estimate speech loudness, providing a more precise estimate compared to speech-gated loudness. Moreover, we define critical passages, i.e., passages in which speech is likely to be hard to understand. Critical passages are defined based on the local Speech Loudness Deviation (SLD) and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing

MethodsSparse Evolutionary Training · Focus