DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Ying Yang; Zhengyao Lv; Tianlin Pan; Haofan Wang; Binxin Yang; Hubery Yin; Chen Li; Chenyang Si

arXiv:2512.02931·cs.CV·December 3, 2025

DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Chenyang Si

PDF

Open Access 4 Reviews

TL;DR

This paper addresses the limited diversity in bitwise autoregressive image models by analyzing key issues and proposing DiverseAR, which adaptively smooths logits and uses energy-based search to enhance diversity without losing quality.

Contribution

The paper introduces DiverseAR, a novel method that improves sample diversity in bitwise AR models through adaptive logits scaling and energy-based path search.

Findings

01

Significant increase in image diversity with maintained quality

02

Adaptive logits scaling effectively smooths prediction distributions

03

Energy-based search preserves high-fidelity image generation

Abstract

In this paper, we investigate the underexplored challenge of sample diversity in autoregressive (AR) generative models with bitwise visual tokenizers. We first analyze the factors that limit diversity in bitwise AR models and identify two key issues: (1) the binary classification nature of bitwise modeling, which restricts the prediction space, and (2) the overly sharp logits distribution, which causes sampling collapse and reduces diversity. Building on these insights, we propose DiverseAR, a principled and effective method that enhances image diversity without sacrificing visual quality. Specifically, we introduce an adaptive logits distribution scaling mechanism that dynamically adjusts the sharpness of the binary output distribution during sampling, resulting in smoother predictions and greater diversity. To mitigate potential fidelity loss caused by distribution smoothing, we…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 0Confidence 1

Strengths

N/A

Weaknesses

It is clear that the submission does not comply with the ICLR 2026 formatting guidelines. The authors have intentionally and significantly reduced the top and side margins of the manuscript. This modification allows them to fit substantially more content than permitted, effectively bypassing the page limit. This provides an unfair advantage over other authors who have adhered to the submission rules.

Reviewer 02Rating 0Confidence 1

Strengths

N/A.

Weaknesses

The authors have modified the official ICLR template by increasing the page margins (specifically, the sidebar spacing). This change effectively allows them to include more content than permitted under the standard format, which constitutes a clear formatting violation. Given that ICLR explicitly requires all submissions to strictly follow the provided style file and formatting guidelines, I believe this submission should be considered for desk rejection due to non-compliance with the conference

Reviewer 03Rating 6Confidence 4

Strengths

1.Strong and Comprehensive Empirical Validation 2. Clear Writing and Strong Logic

Weaknesses

1. Energy-based pathfinding requires sampling $M$ paths and forward propagating $N$ steps, introducing additional computational overhead (approximately 1.12x according to Table 2). Although the authors refer to this as "minimal," it remains a factor to consider in large-scale inference deployments. 2. The core contribution of this paper lies primarily in the sampling function during the inference phase, without altering the model architecture or the training process itself. While addressing sam

Reviewer 04Rating 4Confidence 3

Strengths

- The analysis in Section 3.2 provides insightful explanations for the phenomenon that bitwise autoregressive models exhibit limited diversity, revealing their tendency to become overconfident at early generation scales. - The experiments and ablation studies are comprehensive and well-designed, clearly demonstrating that DiverseAR achieves superior diversity compared to baseline models.

Weaknesses

- This paper uses "LPIPS" and "CLIP" as metrics to compare generation diversity. These notations are similar to the commonly used "LPIPS score" and "CLIP score" for visual quality comparison and may lead to confusion. - For adaptive temperature scaling, the target of maximum bit probability is one possible choice, but other targets such as entropy could also be considered. Including an ablation study over different targets would provide a more comprehensive understanding of the scaling mechanism

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Domain Adaptation and Few-Shot Learning