Scaling Group Inference for Diverse and High-Quality Generation
Gaurav Parmar, Or Patashnik, Daniil Ostashev, Kuan-Chieh Wang, Kfir Aberman, Srinivasa Narasimhan, Jun-Yan Zhu

TL;DR
This paper presents a scalable group inference method that enhances diversity and quality of multiple generated samples simultaneously, addressing redundancy and limited exploration in traditional independent sampling.
Contribution
The authors introduce a novel quadratic integer assignment framework for group inference that improves diversity and quality while maintaining efficiency across various generative tasks.
Findings
Significant improvement in group diversity and quality over baselines.
Method scales efficiently to large candidate sets through progressive pruning.
Applicable across multiple generative tasks including text-to-image and video generation.
Abstract
Generative models typically sample outputs independently, and recent inference-time guidance and scaling algorithms focus on improving the quality of individual samples. However, in real-world applications, users are often presented with a set of multiple images (e.g., 4-8) for each prompt, where independent sampling tends to lead to redundant results, limiting user choices and hindering idea exploration. In this work, we introduce a scalable group inference method that improves both the diversity and quality of a group of samples. We formulate group inference as a quadratic integer assignment problem: candidate outputs are modeled as graph nodes, and a subset is selected to optimize sample quality (unary term) while maximizing group diversity (binary term). To substantially improve runtime efficiency, we progressively prune the candidate set using intermediate predictions, allowing our…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper addresses the practically relevant setting of generating a group of images rather than a single sample, which is common in many real-world applications. - The proposed selection strategy leads to substantial gains over baselines, according to human evaluations. The visual examples clearly illustrate the improvements of the method. - The paper is clearly written and easy to follow.
- The method is significantly slower than the baseline that directly generates four images. According to the paper, the proposed pipeline takes approximately 2.5 times longer. Given the same time, a straightforward baseline can generate approximately ten images, while the proposed method generates only four. - The method does not scale efficiently with respect to the number of candidates $M$ and the pruning ratio $p$. QIP solving grows quickly with $M$, and the required decoding and scoring intr
- The paper addresses an important problem of improving diversity of modern diffusion models, which excels at producing high-quality images yet suffers from lack of variations. - It is interesting to cast diversity-enhancing inference as a QIP problem.
- The related work section omits several important and relevant diversity-promoting diffusion methods, such as CADS [1], Shielded Diffusion [2], and DiversityFlow [3]. - Experimental comparison with existing baselines is limited. 1. The paper only compares with Particle Guidance (PG) and Interval Guidance (IG). More diversity-oriented baselines such as CADS [1], Shielded Diffusion [2], and DiversityFlow [3] should be incorporated to clarify the empirical benefits of the proposed approach.
S1. The paper is well-structured and clearly presented, with a solid experimental evaluation that thoroughly demonstrates all experiments substantiating the proposed method. The paper is well-organized and easy to follow. S2. The problem addressed in the paper is highly relevant to real-world scenarios, and the proposed method effectively resolves it without requiring any additional training. S3. The supplementary material is highly detailed, including numerous applications and qualitative res
W1. Limited novelty of proposed method. Proposed method is too straightforward. The idea of leveraging intermediate prediction accuracy is already well established, and the algorithm simply performs pruning based on group properties. W2. Error in line 455: the original inference-time scaling paper (orange) [1] also incorporates intermediate prediction in its algorithm (see page 8). W3. Although the proposed method is effective for scaled group inference scenarios, it relies on a pruning (filt
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification
