On Calibration of Speech Classification Models: Insights from   Energy-Based Model Investigations

Yaqian Hao; Chenguang Hu; Yingying Gao; Shilei Zhang; Junlan Feng

arXiv:2406.18065·eess.AS·June 27, 2024

On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

PDF

Open Access

TL;DR

This paper investigates the use of Energy-Based Models to improve the calibration of speech classification models, addressing overconfidence issues and enhancing decision reliability across multiple speech tasks.

Contribution

It introduces a joint EBM approach that combines discriminative and generative models to improve calibration in speech classification without losing accuracy.

Findings

01

EBMs effectively improve calibration in speech models

02

The approach reduces overconfidence in classifiers

03

Calibration enhancement achieved across multiple speech tasks

Abstract

For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confidence for speech classification tasks by training a joint EBM integrating a discriminative and a generative model, thereby enhancing the classifiers calibration and mitigating overconfidence. Experimental evaluations conducted on three speech classification tasks specifically: age, emotion, and language recognition. Our findings highlight the competitive performance of EBMs in calibrating the speech classification models. This research emphasizes the potential of EBMs in speech classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

Methodsenergy-based model