RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing

Yue Gong; Hongyu Li; Shanyuan Liu; Bo Cheng; Yuhang Ma; Liebucha Wu; Xiaoyu Wu; Manyuan Zhang; Dawei Leng; Yuhui Yin; Lijun Zhang

arXiv:2603.19206·cs.CV·March 20, 2026

RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing

Yue Gong, Hongyu Li, Shanyuan Liu, Bo Cheng, Yuhang Ma, Liebucha Wu, Xiaoyu Wu, Manyuan Zhang, Dawei Leng, Yuhui Yin, Lijun Zhang

PDF

Open Access

TL;DR

RPiAE introduces a novel representation-based tokenizer with a specialized training strategy, significantly enhancing image generation and editing quality by balancing semantic preservation, reconstruction fidelity, and diffusion modeling efficiency.

Contribution

The paper proposes Representation-Pivoted AutoEncoder (RPiAE), a new tokenizer that improves both image generation and editing by preserving semantics and enhancing reconstruction fidelity.

Findings

01

RPiAE outperforms existing tokenizers in text-to-image generation.

02

RPiAE achieves the best reconstruction fidelity among representation-based tokenizers.

03

RPiAE reduces diffusion modeling complexity while maintaining semantic integrity.

Abstract

Diffusion models have become the dominant paradigm for image generation and editing, with latent diffusion models shifting denoising to a compact latent space for efficiency and scalability. Recent attempts to leverage pretrained visual representation models as tokenizer priors either align diffusion features to representation features or directly reuse representation encoders as frozen tokenizers. Although such approaches can improve generation metrics, they often suffer from limited reconstruction fidelity due to frozen encoders, which in turn degrades editing quality, as well as overly high-dimensional latents that make diffusion modeling difficult. To address these limitations, We propose Representation-Pivoted AutoEncoder, a representation-based tokenizer that improves both generation and editing. We introduce Representation-Pivot Regularization, a training strategy that enables a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Computer Graphics and Visualization Techniques