MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution
Jiaqi Guo, Mingzhen Li, Haohong Wang, Aggelos K. Katsaggelos

TL;DR
MetaSR is a novel framework that adaptively selects and injects relevant metadata into diffusion-based super-resolution models, significantly improving quality and efficiency across diverse content types under resource constraints.
Contribution
It introduces a content-adaptive metadata orchestration method using a Diffusion Transformer that outperforms fixed-guidance approaches in diverse real-world scenarios.
Findings
MetaSR achieves up to 1.0 dB PSNR improvement over baselines.
It reduces transmission bitrate by up to 50% at the same quality levels.
Experiments demonstrate effectiveness across various content and degradation types.
Abstract
We study generative super-resolution (SR) in real-world scenarios where content and degradations vary across domains, genres, and segments. For example, images and videos may alternate between text overlays, fast motion, smooth cartoons, and low-light faces, each benefiting from different forms of side information. Existing metadata-guided SR methods typically use a fixed conditioning design, which is suboptimal when useful cues are content dependent and transmission budgets are limited. We propose MetaSR, a Diffusion Transformer (DiT)-based framework that selects and injects task-relevant metadata to guide SR under resource constraints. Specifically, we use the DiT's own VAE and transformer backbone to fuse heterogeneous metadata, and adopt an efficient distillation strategy that enables one-step diffusion inference. Experiments across diverse content buckets and degradation regimes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
