KL Regularized Normalization Framework for Low Resource Tasks

Neeraj Kumar; Ankur Narang; Brejesh Lall

arXiv:2212.11275·cs.CL·December 23, 2022

KL Regularized Normalization Framework for Low Resource Tasks

Neeraj Kumar, Ankur Narang, Brejesh Lall

PDF

Open Access

TL;DR

This paper introduces KL Regularized Normalization (KL-Norm), a novel technique that enhances normalization in low-resource NLP and speech tasks by improving generalization and reducing overfitting with minimal additional overhead.

Contribution

The paper proposes KL-Norm, a new normalization method that captures expressiveness better and improves low-resource task performance over existing normalization techniques.

Findings

01

KL-Norm outperforms other normalization methods in low-resource NLP and speech tasks.

02

It reduces overfitting and improves out-of-domain generalization.

03

KL-Norm adds negligible model parameters and memory overhead.

Abstract

Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Linear Warmup With Cosine Annealing · Softmax · Layer Normalization · Byte Pair Encoding · Dense Connections