Tractable Transformers for Flexible Conditional Generation
Anji Liu, Xuejie Liu, Dayuan Zhao, Mathias Niepert, Yitao Liang, and Guy Van den Broeck

TL;DR
This paper introduces Tracformer, a Transformer-based model designed for robust and flexible conditional generation, outperforming recent diffusion and autoregressive models in text modeling tasks.
Contribution
The paper presents Tracformer, a novel Transformer architecture that effectively captures local and global context for improved conditional generation performance.
Findings
Achieves state-of-the-art results on text conditional generation tasks.
Demonstrates robustness across diverse conditional generation scenarios.
Outperforms recent diffusion and autoregressive models.
Abstract
Non-autoregressive (NAR) generative models are valuable because they can handle diverse conditional generation tasks in a more principled way than their autoregressive (AR) counterparts, which are constrained by sequential dependency requirements. Recent advancements in NAR models, such as diffusion language models, have demonstrated superior performance in unconditional generation compared to AR models (e.g., GPTs) of similar sizes. However, such improvements do not always lead to improved conditional generation performance. We show that a key reason for this gap is the difficulty in generalizing to conditional probability queries (i.e., the set of unknown variables) unseen during training. As a result, strong unconditional generation performance does not guarantee high-quality conditional generation. This paper proposes Tractable Transformers (Tracformer), a Transformer-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Cellular Automata and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing · Attention Dropout · Linear Layer · Multi-Head Attention · Sparse Transformer · Diffusion
