DiveUp: Learning Feature Upsampling from Diverse Vision Foundation Models
Xiaoqiong Liu, Heng Fan

TL;DR
DiveUp introduces a multi-model relational guidance framework for feature upsampling in vision foundation models, improving robustness and accuracy by leveraging diverse model consensus and geometric structure representations.
Contribution
It proposes a universal, encoder-agnostic upsampling method that uses multiple vision foundation models and a novel relational feature representation to enhance pixel-level understanding tasks.
Findings
Achieves state-of-the-art results on dense prediction tasks.
Effectively filters out high-norm artifacts during upsampling.
Demonstrates robustness across diverse vision models.
Abstract
Recently, feature upsampling has gained increasing attention owing to its effectiveness in enhancing vision foundation models (VFMs) for pixel-level understanding tasks. Existing methods typically rely on high-resolution features from the same foundation model to achieve upsampling via self-reconstruction. However, relying solely on intra-model features forces the upsampler to overfit to the source model's inherent location misalignment and high-norm artifacts. To address this fundamental limitation, we propose DiveUp, a novel framework that breaks away from single-model dependency by introducing multi-VFM relational guidance. Instead of naive feature fusion, DiveUp leverages diverse VFMs as a panel of experts, utilizing their structural consensus to regularize the upsampler's learning process, effectively preventing the propagation of inaccurate spatial structures from the source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
