MUSE: Model-based Uncertainty-aware Similarity Estimation for zero-shot 2D Object Detection and Segmentation

Sungmin Cho; Sungbum Park; Insoo Oh

arXiv:2510.17866·cs.CV·October 22, 2025

MUSE: Model-based Uncertainty-aware Similarity Estimation for zero-shot 2D Object Detection and Segmentation

Sungmin Cho, Sungbum Park, Insoo Oh

PDF

Open Access

TL;DR

MUSE is a training-free, uncertainty-aware framework for zero-shot 2D object detection and segmentation that leverages multi-view templates and a joint similarity metric, achieving state-of-the-art results without additional training.

Contribution

MUSE introduces a novel training-free approach combining multi-view templates, a joint similarity metric, and uncertainty-aware refinement for zero-shot 2D object detection and segmentation.

Findings

01

Achieves state-of-the-art performance on BOP Challenge 2025.

02

Ranks first across multiple tracks without additional training.

03

Effectively combines global and local representations for robust matching.

Abstract

In this work, we introduce MUSE (Model-based Uncertainty-aware Similarity Estimation), a training-free framework designed for model-based zero-shot 2D object detection and segmentation. MUSE leverages 2D multi-view templates rendered from 3D unseen objects and 2D object proposals extracted from input query images. In the embedding stage, it integrates class and patch embeddings, where the patch embeddings are normalized using generalized mean pooling (GeM) to capture both global and local representations efficiently. During the matching stage, MUSE employs a joint similarity metric that combines absolute and relative similarity scores, enhancing the robustness of matching under challenging scenarios. Finally, the similarity score is refined through an uncertainty-aware object prior that adjusts for proposal reliability. Without any additional training or fine-tuning, MUSE achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Visual Attention and Saliency Detection