Cold Fusion: Training Seq2Seq Models Together with Language Models
Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates

TL;DR
This paper introduces Cold Fusion, a method that integrates pre-trained language models into Seq2Seq training, significantly enhancing speech recognition performance, convergence speed, and domain adaptation with minimal labeled data.
Contribution
The paper proposes Cold Fusion, a novel approach for combining pre-trained language models with Seq2Seq models during training, improving efficiency and transferability.
Findings
Faster convergence of Seq2Seq models with Cold Fusion.
Enhanced domain transfer with less labeled data.
Improved generalization in speech recognition tasks.
Abstract
Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language model. In this work, we present the Cold Fusion method, which leverages a pre-trained language model during training, and show its effectiveness on the speech recognition task. We show that Seq2Seq models with Cold Fusion are able to better utilize language information enjoying i) faster convergence and better generalization, and ii) almost complete transfer to a new domain while using less than 10% of the labeled training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
