Train your classifier first: Cascade Neural Networks Training from upper   layers to lower layers

Shucong Zhang; Cong-Thanh Do; Rama Doddipatla; Erfan Loweimi; Peter; Bell; Steve Renals

arXiv:2102.04697·eess.AS·February 10, 2021

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter, Bell, Steve Renals

PDF

Open Access

TL;DR

This paper introduces a novel top-down training method for neural networks that improves classifier transferability within the same dataset, demonstrated through significant performance gains in speech recognition and language modeling tasks.

Contribution

The paper proposes a new cascade training approach that trains classifiers from upper to lower layers, enhancing within-dataset transferability and model performance.

Findings

01

Improved RNN ASR performance on Wall Street Journal

02

Enhanced self-attention ASR results on Switchboard

03

Better AWD-LSTM language model metrics on WikiText-2

Abstract

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset. That is, in general, freezing the trained feature extractor (the lower layers) and retraining the classifier (the upper layers) on the same dataset leads to worse performance. In this paper, for the first time, we show that the frozen classifier is transferable within the same dataset. We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers. We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks. The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsSigmoid Activation · Tanh Activation · Variational Dropout · Dropout · Weight Tying · DropConnect · Long Short-Term Memory · Activation Regularization · Temporal Activation Regularization · Embedding Dropout