BiGain: Unified Token Compression for Joint Generation and Classification
Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen

TL;DR
BiGain introduces a frequency-aware token compression framework that enhances both generation quality and classification accuracy in diffusion models, enabling faster yet effective image synthesis and recognition.
Contribution
It presents a novel, training-free, plug-and-play approach with frequency separation operators that jointly optimize diffusion model generation and classification performance.
Findings
Improves classification accuracy by over 7% on ImageNet-1K with 70% token merging.
Enhances FID scores, indicating better image quality, under accelerated diffusion.
Consistently benefits various backbones and datasets in speed-accuracy trade-offs.
Abstract
Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2)…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper reframes token compression for diffusion models from a generation-only objective to a joint generation and discrimination objective. 2. From a frequency-domain perspective, it introduces the principle of balanced spectral retention and clearly explains why common acceleration methods tend to degrade classification accuracy earlier and more severely. 3. The empirical coverage is extensive, with comprehensive comparisons and ablations that substantiate the core claims.
1. The manuscript assumes a scenario where diffusion models must perform both generation and classification, proposing a unified lightweight compression framework for such dual-purpose usage. However, this setting appears questionable, as in practice generation and classification are typically deployed as separate models with distinct optimization objectives and performance requirements. The motivation for joint compression is therefore unclear and may not correspond to a realistic deployment se
The paper proposes a simple yet effective framework, BiGain, for unified token compression in diffusion models, addressing both generation and classification. The experimental design is thorough and well-executed—extensive experiments across multiple architectures (U-Net, DiT) and datasets (ImageNet, COCO, Oxford-IIIT Pets) convincingly demonstrate the method’s effectiveness. The analysis is comprehensive, with ablation studies and visualizations that provide clear insights into the model’s beha
While the paper presents extensive quantitative comparisons (e.g., FID, accuracy), the evaluation primarily focuses on numerical metrics. For a generation-related work, qualitative assessment is equally crucial—especially when employing lossy acceleration strategies such as token compression. However, the paper lacks visual examples of generated results, making it difficult to judge the actual perceptual quality and aesthetic fidelity of the outputs. In many cases, FID may remain stable while th
1. Innovative Problem Definition: The paper's primary strength lies in redefining the goal of token reduction for diffusion models, shifting from a single focus on generation quality to a dual objective that also includes discriminative performance. This is a novel and important perspective. 2. Strong Empirical Results: The experiments are comprehensive and demonstrate that the proposed method achieves significant improvements over baselines on multiple datasets and model architectures.
1. Insufficient Analysis of the Core Conflict: Diffusion models can be used as training-free classifiers, which implies a high correlation between their generative and discriminative capabilities. The paper fails to provide a clear analysis of why previous methods, which primarily target generative ability, cause such a severe degradation in discriminative performance. This foundational analysis is missing. 2. Limited Novelty in Dual-Objective Design: The only design in the proposed method that
1.The method proposed in the paper has a certain performance improvement compared to baseline methods on different datasets. 2.The implementation of the paper method is not complicated and can be easily adapted to different models. 3.The paper conducted experiments on multiple datasets.
1.The motivation of the paper is not clear enough, why should the diffusion model be used for a large number of discriminative tasks instead of focusing on generative tasks. And motivation for separating features into frequency domain is not obvious. 2.The description of the method in the paper is unclear and lacks necessary formal language and symbolic definitions. 3.The paper lacks necessary visualization and illustrations to illustrate the motivation and implementation process of the propos
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Stochastic Gradient Optimization Techniques
