Accelerating Inference of Masked Image Generators via Reinforcement Learning
Pranav Subbaraman, Shufan Li, Siyan Zhao, Aditya Grover

TL;DR
This paper introduces Speed-RL, a reinforcement learning-based approach to accelerate masked generative models, reducing inference steps by 3x while preserving image quality, contrasting with traditional distillation methods.
Contribution
The paper presents a novel reinforcement learning paradigm for accelerating masked generative models, outperforming distillation-based methods in inference speed and quality.
Findings
Achieved 3x inference speedup with maintained image quality.
Reinforcement learning effectively balances quality and speed in model acceleration.
Outperforms traditional distillation methods in efficiency and output quality.
Abstract
Masked Generative Models (MGM)s demonstrate strong capabilities in generating high-fidelity images. However, they need many sampling steps to create high-quality generations, resulting in slow inference speed. In this work, we propose Speed-RL, a novel paradigm for accelerating a pretrained MGMs to generate high-quality images in fewer steps. Unlike conventional distillation methods which formulate the acceleration problem as a distribution matching problem, where a few-step student model is trained to match the distribution generated by a many-step teacher model, we consider this problem as a reinforcement learning problem. Since the goal of acceleration is to generate high quality images in fewer steps, we can combine a quality reward with a speed reward and finetune the base model using reinforcement learning with the combined reward as the optimization target. Through extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Image Enhancement Techniques
