Investigating Masking-based Data Generation in Language Models
Ed S. Ma

TL;DR
This paper explores how masked language models like BERT can be used for data augmentation in NLP, showing that such methods effectively improve model performance across various tasks.
Contribution
It provides a comprehensive discussion on utilizing masked language models for data augmentation, highlighting their potential to enhance NLP model training.
Findings
Masked language model-based augmentation improves NLP task performance.
MLM data augmentation is simple yet effective.
The approach broadens the application of MLM in NLP training.
Abstract
The current era of natural language processing (NLP) has been defined by the prominence of pre-trained language models since the advent of BERT. A feature of BERT and models with similar architecture is the objective of masked language modeling, in which part of the input is intentionally masked and the model is trained to predict this piece of masked information. Data augmentation is a data-driven technique widely used in machine learning, including research areas like computer vision and natural language processing, to improve model performance by artificially augmenting the training data set by designated techniques. Masked language models (MLM), an essential training feature of BERT, have introduced a novel approach to perform effective pre-training on Transformer based models in natural language processing tasks. Recent studies have utilized masked language model to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Absolute Position Encodings · Adam · Byte Pair Encoding · Linear Warmup With Linear Decay · Linear Layer · Attention Dropout · Label Smoothing
