Metric-Guided Feature Fusion of Visual Foundation Models for Segmentation Tasks

Yachan Guo,JoseLuis Gomez Zurita,Danna Xue,Yi Xiao,AntonioManuel Lopez Pena

arXiv:2605.16864·cs.CV·May 19, 2026

Metric-Guided Feature Fusion of Visual Foundation Models for Segmentation Tasks

Yachan Guo,JoseLuis Gomez Zurita,Danna Xue,Yi Xiao,AntonioManuel Lopez Pena

PDF

1 Repo

TL;DR

This paper introduces a metric-guided feature fusion method for visual foundation models that improves dense prediction tasks by effectively combining complementary features without complex architecture changes.

Contribution

It proposes a novel label-free metric-guided approach to select and fuse features from different VFMs, enhancing dense prediction performance.

Findings

01

Achieves consistent performance gains across multiple dense prediction tasks.

02

Improves object-level semantics and boundary localization.

03

Uses simple training scheme without complex architectural modifications.

Abstract

Although large-scale visual foundation models (VFMs) achieve remarkable performance in semantic understanding, they still underperform in instance-aware dense prediction tasks. They exhibit different biases in representation: for instance, promptable segmentation models (e.g., SAM2) focus on fine-grained region boundaries, while self-supervised models (e.g., DINOv3) emphasize object-level structure. This observation highlights the potential of combining complementary features from different VFMs to enhance downstream dense prediction tasks. However, naive multi-VFM fusion seldom leads to reliable gains, and interpretable principles for leveraging their complementary features are still underexplored. In this work, we propose a metric-guided approach that effectively selects and aggregates complementary features from different VFMs based on explicit assessment scores. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gyc-code/metric-guided-fusion
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.