MAGNET: Augmenting Generative Decoders with Representation Learning and   Infilling Capabilities

Savya Khosla; Aditi Tiwari; Kushal Kafle; Simon Jenni; Handong Zhao,; John Collomosse; Jing Shi

arXiv:2501.08648·cs.CL·February 17, 2025

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Savya Khosla, Aditi Tiwari, Kushal Kafle, Simon Jenni, Handong Zhao,, John Collomosse, Jing Shi

PDF

Open Access 1 Video

TL;DR

MAGNET enhances decoder-only large language models by integrating representation learning and infilling capabilities through a unified training approach, resulting in improved representations, context-aware infilling, and maintained reasoning abilities.

Contribution

Introduces MAGNET, a novel method that combines bidirectional and causal attention in decoder-only LLMs for versatile training and improved performance.

Findings

01

Outperforms strong text encoders in representation tasks

02

Generates contextually appropriate text infills

03

Maintains knowledge and reasoning during generation

Abstract

While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, unidirectional and bidirectional models are typically trained separately with distinct objectives (generation and representation learning). This separation overlooks the opportunity for developing a more versatile language model and for these objectives to complement each other. In this work, we propose MAGNET, a method for adapting decoder-only LLMs to generate robust representations and infill missing text spans. MAGNET employs three self-supervised training objectives and introduces an attention mechanism that combines bidirectional and causal attention, enabling unified training across all objectives. Our results demonstrate that LLMs adapted with MAGNET (1) surpass strong text encoders on token-level and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities· underline

Taxonomy

TopicsVideo Analysis and Summarization

MethodsSoftmax · Attention Is All You Need