Align, Mask and Select: A Simple Method for Incorporating Commonsense   Knowledge into Language Representation Models

Zhi-Xiu Ye; Qian Chen; Wen Wang; Zhen-Hua Ling

arXiv:1908.06725·cs.CL·May 7, 2020·70 cites

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling

PDF

Open Access

TL;DR

This paper introduces a simple pre-training method called 'align, mask, and select' (AMS) that effectively incorporates commonsense knowledge into language models, significantly improving performance on commonsense benchmarks without harming general NLP tasks.

Contribution

The paper presents a novel automatic dataset creation method and pre-training approach that enhances commonsense reasoning in language models, outperforming previous state-of-the-art methods.

Findings

01

Improved accuracy on CommonsenseQA and Winograd Schema Challenge

02

Maintains performance on sentence classification and natural language inference tasks

03

Demonstrates effectiveness of AMS in integrating commonsense knowledge

Abstract

The state-of-the-art pre-trained language representation models, such as Bidirectional Encoder Representations from Transformers (BERT), rarely incorporate commonsense knowledge or other knowledge explicitly. We propose a pre-training approach for incorporating commonsense knowledge into language representation models. We construct a commonsense-related multi-choice question answering dataset for pre-training a neural language representation model. The dataset is created automatically by our proposed "align, mask, and select" (AMS) method. We also investigate different pre-training tasks. Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge. We also observe that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax