Enabling Autoregressive Models to Fill In Masked Tokens

Daniel Israel; Aditya Grover; Guy Van den Broeck

arXiv:2502.06901·cs.LG·February 12, 2025

Enabling Autoregressive Models to Fill In Masked Tokens

Daniel Israel, Aditya Grover, Guy Van den Broeck

PDF

Open Access 1 Video

TL;DR

This paper introduces MARIA, a hybrid model that combines autoregressive and masked language models to enable fast and effective masked token infilling, outperforming existing methods.

Contribution

The paper proposes MARIA, a novel architecture that integrates MLM and AR models with a linear decoder for improved masked infilling performance.

Findings

01

MARIA outperforms discrete diffusion models on infilling tasks.

02

It retains AR models' inference speed advantages.

03

The approach effectively combines strengths of MLM and AR paradigms.

Abstract

Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are inherently incapable of masked infilling, which is the ability to predict masked tokens between past and future context. In contrast, MLM models suffer from intrinsic computational inefficiencies during both training and inference that hinder their scalability. This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that leverages the strengths of both paradigms to achieve state-of-the-art masked infilling performance. MARIA combines a pre-trained MLM and AR model by training a linear decoder that takes their concatenated hidden states as input. This minimal modification enables the AR model to perform infilling while retaining its inherent advantages in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enabling Autoregressive Models to Fill In Masked Tokens· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Human Motion and Animation

MethodsDiffusion