Stabilized training of joint energy-based models and their practical   applications

Martin Sustek; Samik Sadhu; Lukas Burget; Hynek Hermansky; Jesus; Villalba; Laureano Moro-Velazquez; Najim Dehak

arXiv:2303.04187·cs.LG·March 9, 2023·1 cites

Stabilized training of joint energy-based models and their practical applications

Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus, Villalba, Laureano Moro-Velazquez, Najim Dehak

PDF

Open Access

TL;DR

This paper introduces a stabilized training method for joint energy-based models (JEM) that improves training stability and enables practical applications like speech generation and denoising, across various data modalities.

Contribution

The authors propose ST-JEM, a stabilization technique for SGLD-based JEM training, and add a regularization term to improve decision certainty, broadening JEM's applicability.

Findings

01

Stabilized training enables JEM to be trained on speech data.

02

The approach improves the quality of generated speech.

03

JEM can be used for speech denoising and other applications.

Abstract

The recently proposed Joint Energy-based Model (JEM) interprets discriminatively trained classifier $p (y ∣ x)$ as an energy model, which is also trained as a generative model describing the distribution of the input observations $p (x)$ . The JEM training relies on "positive examples" (i.e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p (x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD). Unfortunately, SGLD often fails to deliver negative samples of sufficient quality during the standard JEM training, which causes a very unbalanced contribution from the positive and negative examples when calculating gradients for JEM updates. As a consequence, the standard JEM training is quite unstable requiring careful tuning of hyper-parameters and frequent restarts when the training starts diverging. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications