Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling

TL;DR
This paper introduces a simple pre-training method called 'align, mask, and select' (AMS) that effectively incorporates commonsense knowledge into language models, significantly improving performance on commonsense benchmarks without harming general NLP tasks.
Contribution
The paper presents a novel automatic dataset creation method and pre-training approach that enhances commonsense reasoning in language models, outperforming previous state-of-the-art methods.
Findings
Improved accuracy on CommonsenseQA and Winograd Schema Challenge
Maintains performance on sentence classification and natural language inference tasks
Demonstrates effectiveness of AMS in integrating commonsense knowledge
Abstract
The state-of-the-art pre-trained language representation models, such as Bidirectional Encoder Representations from Transformers (BERT), rarely incorporate commonsense knowledge or other knowledge explicitly. We propose a pre-training approach for incorporating commonsense knowledge into language representation models. We construct a commonsense-related multi-choice question answering dataset for pre-training a neural language representation model. The dataset is created automatically by our proposed "align, mask, and select" (AMS) method. We also investigate different pre-training tasks. Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge. We also observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
