Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models
Jiajun He, Zongyu Guo, Zhaoyang Jia, Xiaoyi Zhang, Jiahao Li, Xiao Li, Bin Li, Jos\'e Miguel Hern\'andez-Lobato, Yan Lu

TL;DR
This paper introduces a novel implicit visual representation framework that encodes signals as functions parametrized by low-rank adaptations to a frozen generative model, enabling efficient compression, scaling, and control.
Contribution
It presents a new functional visual representation method that achieves perceptual video compression and allows inference-time adjustments, bridging compression and generation.
Findings
Achieves strong perceptual video compression at extremely low bitrates.
Enables inference-time scaling and control of visual signals.
Provides a unified framework linking visual compression and generation.
Abstract
Modern visual generative models acquire rich visual knowledge through large-scale training, yet existing visual representations (such as pixels, latents, or tokens) remain external to the model and cannot directly exploit this knowledge for compact storage or reuse. In this work, we introduce a new visual representation framework that encodes a signal as a function, which is parametrized by low-rank adaptations attached to a frozen visual generative model. Such implicit representations of visual signals, e.g., an 81-frame video, can further be hashed into a single compact vector, achieving strong perceptual video compression at extremely low bitrates. Beyond basic compression, the functional nature of this representation enables inference-time scaling and control, allowing additional refinement on the compression performance. More broadly, as the implicit representations directly act as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
