Uncertainty Estimation in Instance Segmentation of Affordances via Bayesian Visual Transformers

Lorenzo Mur-Labadia; Ruben Martinez-Cantina; Jose J.Guerrero

arXiv:2605.03614·cs.CV·May 6, 2026

Uncertainty Estimation in Instance Segmentation of Affordances via Bayesian Visual Transformers

Lorenzo Mur-Labadia, Ruben Martinez-Cantina, Jose J.Guerrero

PDF

TL;DR

This paper introduces a Bayesian attention-based model for instance segmentation of visual affordances, improving accuracy and uncertainty estimation, with applications in robotics, AR, and prosthetics.

Contribution

It extends attention-based architectures with Bayesian ensembles for uncertainty quantification and proposes a novel measure for probabilistic mask quality.

Findings

01

Achieved +7.4 percentage points in $F_{eta}^w$ score on IIT-Aff dataset.

02

Bayesian models produce better-calibrated probabilities and less overconfidence.

03

Uncertainty estimates correlate with object contours and challenging pixels.

Abstract

Visual affordances identify regions in an image with potential interactions, offering a novel paradigm for scene understanding. Recognizing affordances allows autonomous robots to act more naturally, could enhance human-robot interactions, enrich augmented reality systems, and benefit prosthetic vision devices. Accurate and localized prediction of affordance regions, rather than general saliency maps is crucial for these applications. We present a model for instance segmentation of affordances by adopting sample-based and ensembles approaches for uncertainty estimation. We extend an attention-based architecture for our novel task, showing with detailed ablation experiments the effects of each component. By comparing the distribution of these different detections, we extract pixel-wise epistemic and aleatoric variances at both the semantic and spatial levels. In addition, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.