Enabling Autoregressive Models to Fill In Masked Tokens
Daniel Israel, Aditya Grover, Guy Van den Broeck

TL;DR
This paper introduces MARIA, a hybrid model that combines autoregressive and masked language models to enable fast and effective masked token infilling, outperforming existing methods.
Contribution
The paper proposes MARIA, a novel architecture that integrates MLM and AR models with a linear decoder for improved masked infilling performance.
Findings
MARIA outperforms discrete diffusion models on infilling tasks.
It retains AR models' inference speed advantages.
The approach effectively combines strengths of MLM and AR paradigms.
Abstract
Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are inherently incapable of masked infilling, which is the ability to predict masked tokens between past and future context. In contrast, MLM models suffer from intrinsic computational inefficiencies during both training and inference that hinder their scalability. This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that leverages the strengths of both paradigms to achieve state-of-the-art masked infilling performance. MARIA combines a pre-trained MLM and AR model by training a linear decoder that takes their concatenated hidden states as input. This minimal modification enables the AR model to perform infilling while retaining its inherent advantages in terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Human Motion and Animation
MethodsDiffusion
