Knowledge-Aware Language Model Pretraining
Corby Rosset, Chenyan Xiong, Minh Phan, Xia Song, Paul Bennett,, Saurabh Tiwary

TL;DR
This paper introduces a simple method to enhance pretrained language models with explicit knowledge signals by signaling entities during pretraining, leading to improved factual accuracy and downstream task performance without altering the model architecture.
Contribution
The authors propose a knowledge-aware pretraining approach that signals entities via an extended tokenizer and an additional prediction task, improving knowledge encoding without architectural changes.
Findings
Enhanced factual correctness in knowledge probing tasks
Improved zero-shot question-answering performance
More semantically rich hidden representations
Abstract
How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsLinear Layer · Cosine Annealing · Discriminative Fine-Tuning · Dropout · Byte Pair Encoding · Multi-Head Attention · Residual Connection · Attention Is All You Need · Linear Warmup With Cosine Annealing · Attention Dropout
