Universal Language Model Fine-tuning for Text Classification

Jeremy Howard; Sebastian Ruder

arXiv:1801.06146·cs.CL·May 24, 2018

Universal Language Model Fine-tuning for Text Classification

Jeremy Howard, Sebastian Ruder

PDF

5 Repos 4 Models 1 Datasets

TL;DR

ULMFiT introduces a universal transfer learning approach for NLP that significantly improves text classification performance across various tasks with minimal labeled data.

Contribution

The paper presents ULMFiT, a novel fine-tuning method that applies to any NLP task, outperforming previous models and reducing data requirements.

Findings

01

Outperforms state-of-the-art on six text classification tasks

02

Reduces error by 18-24% on most datasets

03

Achieves comparable performance with only 100 labeled examples

Abstract

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

avduarte333/arXivTection
dataset· 761 dl
761 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout · Adam · Sigmoid Activation · Tanh Activation · Temporal Activation Regularization · DropConnect · Long Short-Term Memory · Activation Regularization · Embedding Dropout · Variational Dropout