MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Savya Khosla, Aditi Tiwari, Kushal Kafle, Simon Jenni, Handong Zhao,, John Collomosse, Jing Shi

TL;DR
MAGNET enhances decoder-only large language models by integrating representation learning and infilling capabilities through a unified training approach, resulting in improved representations, context-aware infilling, and maintained reasoning abilities.
Contribution
Introduces MAGNET, a novel method that combines bidirectional and causal attention in decoder-only LLMs for versatile training and improved performance.
Findings
Outperforms strong text encoders in representation tasks
Generates contextually appropriate text infills
Maintains knowledge and reasoning during generation
Abstract
While originally designed for unidirectional generative modeling, decoder-only large language models (LLMs) are increasingly being adapted for bidirectional modeling. However, unidirectional and bidirectional models are typically trained separately with distinct objectives (generation and representation learning). This separation overlooks the opportunity for developing a more versatile language model and for these objectives to complement each other. In this work, we propose MAGNET, a method for adapting decoder-only LLMs to generate robust representations and infill missing text spans. MAGNET employs three self-supervised training objectives and introduces an attention mechanism that combines bidirectional and causal attention, enabling unified training across all objectives. Our results demonstrate that LLMs adapted with MAGNET (1) surpass strong text encoders on token-level and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Analysis and Summarization
MethodsSoftmax · Attention Is All You Need
