Train No Evil: Selective Masking for Task-Guided Pre-Training
Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun

TL;DR
This paper introduces a task-guided pre-training framework with selective masking to better capture domain- and task-specific patterns, improving efficiency and performance in downstream tasks.
Contribution
It proposes a novel three-stage training framework with a selective masking strategy for task-guided pre-training, enhancing domain and task-specific pattern learning.
Findings
Achieves comparable or better performance on sentiment analysis tasks.
Reduces training computation cost by less than 50%.
Demonstrates effectiveness and efficiency of the proposed method.
Abstract
Recently, pre-trained language models mostly follow the pre-train-then-fine-tuning paradigm and have achieved great performance on various downstream tasks. However, since the pre-training stage is typically task-agnostic and the fine-tuning stage usually suffers from insufficient supervised data, the models cannot always well capture the domain-specific and task-specific patterns. In this paper, we propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning. In this stage, the model is trained by masked language modeling on in-domain unsupervised data to learn domain-specific patterns and we propose a novel selective masking strategy to learn task-specific patterns. Specifically, we design a method to measure the importance of each token in sequences and selectively mask the important tokens.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
