Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

Jiajun He; Zongyu Guo; Zhaoyang Jia; Xiaoyi Zhang; Jiahao Li; Xiao Li; Bin Li; Jos\'e Miguel Hern\'andez-Lobato; Yan Lu

arXiv:2603.07615·cs.LG·May 5, 2026

Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

Jiajun He, Zongyu Guo, Zhaoyang Jia, Xiaoyi Zhang, Jiahao Li, Xiao Li, Bin Li, Jos\'e Miguel Hern\'andez-Lobato, Yan Lu

PDF

TL;DR

This paper introduces a novel implicit visual representation framework that encodes signals as functions parametrized by low-rank adaptations to a frozen generative model, enabling efficient compression, scaling, and control.

Contribution

It presents a new functional visual representation method that achieves perceptual video compression and allows inference-time adjustments, bridging compression and generation.

Findings

01

Achieves strong perceptual video compression at extremely low bitrates.

02

Enables inference-time scaling and control of visual signals.

03

Provides a unified framework linking visual compression and generation.

Abstract

Modern visual generative models acquire rich visual knowledge through large-scale training, yet existing visual representations (such as pixels, latents, or tokens) remain external to the model and cannot directly exploit this knowledge for compact storage or reuse. In this work, we introduce a new visual representation framework that encodes a signal as a function, which is parametrized by low-rank adaptations attached to a frozen visual generative model. Such implicit representations of visual signals, e.g., an 81-frame video, can further be hashed into a single compact vector, achieving strong perceptual video compression at extremely low bitrates. Beyond basic compression, the functional nature of this representation enables inference-time scaling and control, allowing additional refinement on the compression performance. More broadly, as the implicit representations directly act as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.