Benchmarking Feature Upsampling Methods for Vision Foundation Models   using Interactive Segmentation

Volodymyr Havrylov; Haiwen Huang; Dan Zhang; Andreas Geiger

arXiv:2505.02075·cs.CV·May 6, 2025

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

Volodymyr Havrylov, Haiwen Huang, Dan Zhang, Andreas Geiger

PDF

Open Access 1 Repo

TL;DR

This paper evaluates various feature upsampling methods for Vision Foundation Models using Interactive Segmentation as a benchmark, demonstrating that proper upsampling strategies enhance dense prediction performance.

Contribution

It introduces Interactive Segmentation as a new benchmark for assessing feature upsampling methods in VFMs, highlighting the impact of upsampling choices on dense prediction quality.

Findings

01

Upsampling strategies significantly affect VFM feature quality.

02

Proper upsampling improves dense prediction accuracy.

03

Benchmarking reveals optimal methods for different scenarios.

Abstract

Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks. As VFMs' popularity grows, there is an increasing interest in understanding their effectiveness for dense prediction tasks. However, VFMs typically produce low-resolution features, limiting their direct applicability in this context. One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution. To assess the effectiveness of this approach, we investigate Interactive Segmentation (IS) as a novel benchmark for evaluating feature upsampling methods on VFMs. Due to its inherent multimodal input, consisting of an image and a set of user-defined clicks, as well as its dense mask output, IS creates a challenging environment that demands comprehensive visual scene understanding. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

havrylovv/isegprobe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSatellite Image Processing and Photogrammetry · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training