ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu,, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu

TL;DR
ADDP introduces a unified framework that learns general representations for both image recognition and generation by alternating between pixel decoding and VQ token generation, achieving state-of-the-art results in multiple tasks.
Contribution
The paper proposes the first unified approach integrating pixel-based recognition and VQ token-based generation within a single diffusion process.
Findings
Achieves competitive results on ImageNet classification.
Demonstrates strong performance on COCO detection.
Excels in ADE20k segmentation tasks.
Abstract
Image recognition and generation have long been developed independently of each other. With the recent trend towards general-purpose representation learning, the development of general representations for both recognition and generation tasks is also promoted. However, preliminary attempts mainly focus on generation performance, but are still inferior on recognition tasks. These methods are modeled in the vector-quantized (VQ) space, whereas leading recognition methods use pixels as inputs. Our key insights are twofold: (1) pixels as inputs are crucial for recognition tasks; (2) VQ tokens as reconstruction targets are beneficial for generation tasks. These observations motivate us to propose an Alternating Denoising Diffusion Process (ADDP) that integrates these two spaces within a single representation learning framework. In each denoising step, our method first decodes pixels from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Neural Networks and Applications
MethodsDiffusion · Focus
