Knowledgeable Salient Span Mask for Enhancing Language Models as   Knowledge Base

Cunxiang Wang; Fuli Luo; Yanyang Li; Runxin Xu; Fei Huang; Yue; Zhang

arXiv:2204.07994·cs.CL·October 12, 2023·1 cites

Knowledgeable Salient Span Mask for Enhancing Language Models as Knowledge Base

Cunxiang Wang, Fuli Luo, Yanyang Li, Runxin Xu, Fei Huang, Yue, Zhang

PDF

Open Access

TL;DR

This paper investigates how pre-trained language models retrieve knowledge from unstructured text, identifies their limitations, and proposes self-supervised methods to improve their knowledge acquisition, demonstrating effectiveness on knowledge-intensive tasks.

Contribution

It introduces the first fully self-supervised approach for enhancing knowledge learning in continual pre-training of language models.

Findings

01

PLMs attend less to knowledge-baring tokens and perform poorly on them.

02

Proposed methods improve knowledge retrieval in PLMs.

03

Effective on knowledge-intensive NLP tasks.

Abstract

Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks. However, by asking models to do cloze-style tests, recent work finds that PLMs are short in acquiring knowledge from unstructured text. To understand the internal behaviour of PLMs in retrieving knowledge, we first define knowledge-baring (K-B) tokens and knowledge-free (K-F) tokens for unstructured text and ask professional annotators to label some samples manually. Then, we find that PLMs are more likely to give wrong predictions on K-B tokens and attend less attention to those tokens inside the self-attention module. Based on these observations, we develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner. Experiments on knowledge-intensive tasks show the effectiveness of the proposed methods. To our best knowledge,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Adam · Multi-Head Attention · Residual Connection · Dense Connections · Attention Dropout · Layer Normalization · Weight Decay