Knowledge-Aware Language Model Pretraining

Corby Rosset; Chenyan Xiong; Minh Phan; Xia Song; Paul Bennett,; Saurabh Tiwary

arXiv:2007.00655·cs.CL·February 5, 2021·44 cites

Knowledge-Aware Language Model Pretraining

Corby Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul Bennett,, Saurabh Tiwary

PDF

Open Access

TL;DR

This paper introduces a simple method to enhance pretrained language models with explicit knowledge signals by signaling entities during pretraining, leading to improved factual accuracy and downstream task performance without altering the model architecture.

Contribution

The authors propose a knowledge-aware pretraining approach that signals entities via an extended tokenizer and an additional prediction task, improving knowledge encoding without architectural changes.

Findings

01

Enhanced factual correctness in knowledge probing tasks

02

Improved zero-shot question-answering performance

03

More semantically rich hidden representations

Abstract

How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsLinear Layer · Cosine Annealing · Discriminative Fine-Tuning · Dropout · Byte Pair Encoding · Multi-Head Attention · Residual Connection · Attention Is All You Need · Linear Warmup With Cosine Annealing · Attention Dropout