Do You Have the Right Scissors? Tailoring Pre-trained Language Models   via Monte-Carlo Methods

Ning Miao; Yuxuan Song; Hao Zhou; Lei Li

arXiv:2007.06162·cs.CL·July 14, 2020·1 cites

Do You Have the Right Scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

Ning Miao, Yuxuan Song, Hao Zhou, Lei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MC-Tailor, a Monte-Carlo based method that improves pre-trained language models for text generation by reallocating probability mass, addressing over- and under-estimation issues during fine-tuning.

Contribution

The paper presents a novel Monte-Carlo based approach, MC-Tailor, to enhance fine-tuning of language models for text generation by better managing probability distributions.

Findings

01

MC-Tailor outperforms standard fine-tuning across multiple datasets.

02

It significantly reduces over- and under-estimation in probability predictions.

03

The method is effective and generalizable for text generation tasks.

Abstract

It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data. In practice, we observe that fine-tuning a pre-trained model on a small dataset may lead to over- and/or under-estimation problem. In this paper, we propose MC-Tailor, a novel method to alleviate the above issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones. Experiments on a variety of text generation datasets show that MC-Tailor consistently and significantly outperforms the fine-tuning approach. Our code is available at this url.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NingMiao/MC-tailor
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis