DiveUp: Learning Feature Upsampling from Diverse Vision Foundation Models

Xiaoqiong Liu; Heng Fan

arXiv:2603.13571·cs.CV·March 17, 2026

DiveUp: Learning Feature Upsampling from Diverse Vision Foundation Models

Xiaoqiong Liu, Heng Fan

PDF

Open Access

TL;DR

DiveUp introduces a multi-model relational guidance framework for feature upsampling in vision foundation models, improving robustness and accuracy by leveraging diverse model consensus and geometric structure representations.

Contribution

It proposes a universal, encoder-agnostic upsampling method that uses multiple vision foundation models and a novel relational feature representation to enhance pixel-level understanding tasks.

Findings

01

Achieves state-of-the-art results on dense prediction tasks.

02

Effectively filters out high-norm artifacts during upsampling.

03

Demonstrates robustness across diverse vision models.

Abstract

Recently, feature upsampling has gained increasing attention owing to its effectiveness in enhancing vision foundation models (VFMs) for pixel-level understanding tasks. Existing methods typically rely on high-resolution features from the same foundation model to achieve upsampling via self-reconstruction. However, relying solely on intra-model features forces the upsampler to overfit to the source model's inherent location misalignment and high-norm artifacts. To address this fundamental limitation, we propose DiveUp, a novel framework that breaks away from single-model dependency by introducing multi-VFM relational guidance. Instead of naive feature fusion, DiveUp leverages diverse VFMs as a panel of experts, utilizing their structural consensus to regularize the upsampler's learning process, effectively preventing the propagation of inaccurate spatial structures from the source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications