Guided Transfer Learning for Discrete Diffusion Models

Julian Kleutgens; Claudio Battiloro; Lingkai Kong; Benjamin Grewe; Francesca Dominici; Mauricio Tec

arXiv:2512.10877·cs.LG·April 16, 2026

Guided Transfer Learning for Discrete Diffusion Models

Julian Kleutgens, Claudio Battiloro, Lingkai Kong, Benjamin Grewe, Francesca Dominici, Mauricio Tec

PDF

TL;DR

This paper introduces Guided Transfer Learning (GTL) for discrete diffusion models, enabling efficient transfer to target distributions with linear vocabulary scaling, especially effective in small-data regimes.

Contribution

It proposes a practical, scalable algorithm for transfer learning in discrete DMs, addressing computational challenges and demonstrating effectiveness in language and synthetic data tasks.

Findings

01

GTL reduces transfer learning cost to linear in vocabulary size.

02

GTL outperforms fine-tuning when target data is limited.

03

Poor source-target overlap hampers ratio-based guidance effectiveness.

Abstract

Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically depends on large training datasets, challenging the performance of DMs in small-data regimes -- common under real-world constraints. Aimed at this challenge, recent work in continuous DMs suggests that transfer learning via classifier ratio-based guidance can adapt a pretrained DM to a related target distribution, often outperforming alternatives such as full-weight fine-tuning on the target data. By contrast, transfer learning for discrete DMs remains unexplored. We address this gap by exploring practical analogues of ratio-based transfer learning for discrete DMs. Our theoretical analysis shows that a direct extension of existing ratio-based guidance is computationally prohibitive, scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.