Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards
Orhun Bu\u{g}ra Baran, Melih Kandemir, Ramazan Gokberk Cinbis

TL;DR
This paper introduces a reinforcement learning framework for autoregressive image models that optimizes both sample quality and diversity using novel distribution- and instance-level rewards, leading to improved image generation performance.
Contribution
It proposes a new RL-based tuning method with a distribution-level LOO-FID reward and integrates multiple instance-level rewards for enhanced image synthesis.
Findings
Improved quality and diversity metrics on LlamaGen and VQGAN architectures.
Achieved competitive sample quality without Classifier-Free Guidance.
Reduced inference cost by bypassing traditional guidance methods.
Abstract
Autoregressive (AR) models are highly effective for image generation, yet their standard maximum-likelihood estimation training lacks direct optimization for sample quality and diversity. While reinforcement learning (RL) has been used to align diffusion models, these methods typically suffer from output diversity collapse. Similarly, concurrent RL methods for AR models rely strictly on instance-level rewards, often trading off distributional coverage for quality. To address these limitations, we propose a lightweight RL framework that casts token-based AR synthesis as a Markov Decision Process, optimized via Group Relative Policy Optimization (GRPO). Our core contribution is the introduction of a novel distribution-level Leave-One-Out FID (LOO-FID) reward; by leveraging an exponential moving average of feature moments, it explicitly encourages sample diversity and prevents mode…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment · Domain Adaptation and Few-Shot Learning
