Fine-tuning of Language Models with Discriminator

Vadim Popov; Mikhail Kudinov

arXiv:1811.04623·cs.CL·January 16, 2019·1 cites

Fine-tuning of Language Models with Discriminator

Vadim Popov, Mikhail Kudinov

PDF

Open Access

TL;DR

This paper introduces a novel fine-tuning method for language models that combines cross-entropy loss with a discriminator-estimated reverse Kullback-Leibler divergence, improving performance on language modeling tasks.

Contribution

It proposes a new fine-tuning approach using a discriminator to estimate divergence, enhancing language model quality with minimal hyperparameter tuning.

Findings

01

Perplexity on Penn Treebank improved from 52.4 to 52.1

02

Method scales well across architectures and datasets

03

Requires only learning rate as hyperparameter

Abstract

Cross-entropy loss is a common choice when it comes to multiclass classification tasks and language modeling in particular. Minimizing this loss results in language models of very good quality. We show that it is possible to fine-tune these models and make them perform even better if they are fine-tuned with sum of cross-entropy loss and reverse Kullback-Leibler divergence. The latter is estimated using discriminator network that we train in advance. During fine-tuning probabilities of rare words that are usually underestimated by language models become bigger. The novel approach that we propose allows us to reach state-of-the-art quality on Penn Treebank: perplexity decreases from 52.4 to 52.1. Our fine-tuning algorithm is rather fast, scales well to different architectures and datasets and requires almost no hyperparameter tuning: the only hyperparameter that needs to be tuned is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis