Learn to Rank: Visual Attribution by Learning Importance Ranking

David Schinagl; Christian Fruhwirth-Reisinger; Alexander Prutsch; Samuel Schulter; Horst Possegger

arXiv:2604.05819·cs.CV·April 8, 2026

Learn to Rank: Visual Attribution by Learning Importance Ranking

David Schinagl, Christian Fruhwirth-Reisinger, Alexander Prutsch, Samuel Schulter, Horst Possegger

PDF

TL;DR

This paper introduces a novel learning-based method for visual attribution in computer vision models that directly optimizes ranking metrics using differentiable permutation learning, resulting in more accurate and boundary-aligned explanations.

Contribution

It proposes a new end-to-end training scheme that optimizes deletion and insertion metrics directly via Gumbel-Sinkhorn relaxation, improving interpretability of vision models.

Findings

01

Achieves sharper, boundary-aligned explanations.

02

Demonstrates consistent quantitative improvements.

03

Effective for transformer-based vision models.

Abstract

Interpreting the decisions of complex computer vision models is crucial to establish trust and accountability, especially in safety-critical domains. An established approach to interpretability is generating visual attribution maps that highlight regions of the input most relevant to the model's prediction. However, existing methods face a three-way trade-off. Propagation-based approaches are efficient, but they can be biased and architecture-specific. Meanwhile, perturbation-based methods are causally grounded, yet they are expensive and for vision transformers often yield coarse, patch-level explanations. Learning-based explainers are fast but usually optimize surrogate objectives or distill from heuristic teachers. We propose a learning scheme that instead optimizes deletion and insertion metrics directly. Since these metrics depend on non-differentiable sorting and ranking, we frame…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.