Adaptive Context Matters: Towards Provable Multi-Modality Guidance for Super-Resolution
Jinyi Luo, Minghao Liu, Yifan Li, Zejia Fan, Jiaying Liu

TL;DR
This paper introduces a theoretical framework for multi-modal super-resolution, revealing limitations of existing methods and proposing a novel adaptive fusion approach that improves generalization and semantic consistency.
Contribution
The work provides the first theoretical model for multi-modal SR and develops M$^3$ESR, a dynamic modality fusion framework that enhances risk control and modality contribution optimization.
Findings
M$^3$ESR significantly improves super-resolution performance.
Theoretical analysis shows better modality alignment reduces generalization risk.
Extensive experiments confirm the effectiveness of the proposed method.
Abstract
Super-resolution (SR) is a severely ill-posed problem with inherent ambiguity, as widely recognized in both empirical and theoretical studies. Although recent semantic-guided and multi-modal SR methods exploit large models or external priors to enhance semantic alignment, the fusion of heterogeneous modalities remains insufficiently understood in practice and theory. In this work, we provide the first theoretical modeling of multi-modal SR, revealing that prior methods are bottlenecked by sub-optimal modality utilization. Our analysis shows that the generalization risk bound can be improved by strengthening the alignment between modality weights and their effective contributions, while reducing representation complexity. This theoretical insight inspires us to propose the novel Multi-Modal Mixture-of-Experts Super-Resolution framework (MESR) that employs generalization-oriented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
