Better Language Model with Hypernym Class Prediction

He Bai; Tong Wang; Alessandro Sordoni; Peng Shi

arXiv:2203.10692·cs.CL·March 22, 2022

Better Language Model with Hypernym Class Prediction

He Bai, Tong Wang, Alessandro Sordoni, Peng Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hypernym class prediction approach in neural language models, which improves perplexity by enabling better generalization for rare words through curriculum learning.

Contribution

It proposes a novel curriculum learning method that maps words to hypernym classes, enhancing neural LMs' performance without harming rare word accuracy.

Findings

01

Consistent perplexity improvements on WikiText-103 and Arxiv datasets.

02

Performance gains achieved without sacrificing rare word accuracy.

03

Analysis of unsuccessful alternative methods and future directions.

Abstract

Class-based language models (LMs) have been long devised to address context sparsity in $n$ -gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and Arxiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richardbaihe/robustlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis