Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

Orhun Bu\u{g}ra Baran; Melih Kandemir; Ramazan Gokberk Cinbis

arXiv:2603.23086·cs.LG·March 25, 2026

Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

Orhun Bu\u{g}ra Baran, Melih Kandemir, Ramazan Gokberk Cinbis

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework for autoregressive image models that optimizes both sample quality and diversity using novel distribution- and instance-level rewards, leading to improved image generation performance.

Contribution

It proposes a new RL-based tuning method with a distribution-level LOO-FID reward and integrates multiple instance-level rewards for enhanced image synthesis.

Findings

01

Improved quality and diversity metrics on LlamaGen and VQGAN architectures.

02

Achieved competitive sample quality without Classifier-Free Guidance.

03

Reduced inference cost by bypassing traditional guidance methods.

Abstract

Autoregressive (AR) models are highly effective for image generation, yet their standard maximum-likelihood estimation training lacks direct optimization for sample quality and diversity. While reinforcement learning (RL) has been used to align diffusion models, these methods typically suffer from output diversity collapse. Similarly, concurrent RL methods for AR models rely strictly on instance-level rewards, often trading off distributional coverage for quality. To address these limitations, we propose a lightweight RL framework that casts token-based AR synthesis as a Markov Decision Process, optimized via Group Relative Policy Optimization (GRPO). Our core contribution is the introduction of a novel distribution-level Leave-One-Out FID (LOO-FID) reward; by leveraging an exponential moving average of feature moments, it explicitly encourages sample diversity and prevents mode…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment · Domain Adaptation and Few-Shot Learning