Structured Uncertainty Similarity Score (SUSS): Learning a Probabilistic, Interpretable, Perceptual Metric Between Images
Paula Seidler, Neill D. F. Campbell, Ivor J A Simpson

TL;DR
SUSS is a novel probabilistic, interpretable perceptual similarity metric that models images with structured distributions, aligning well with human judgments and enabling transparent, localized explanations for image similarity assessments.
Contribution
It introduces a structured probabilistic model for perceptual similarity that combines interpretability with strong alignment to human perception, surpassing existing methods in transparency and calibration.
Findings
SUSS closely matches human perceptual judgments.
It provides interpretable, localized explanations of similarity.
Demonstrates competitive performance as a perceptual loss.
Abstract
Perceptual similarity scores that align with human vision are critical for both training and evaluating computer vision models. Deep perceptual losses, such as LPIPS, achieve good alignment but rely on complex, highly non-linear discriminative features with unknown invariances, while hand-crafted measures like SSIM are interpretable but miss key perceptual properties. We introduce the Structured Uncertainty Similarity Score (SUSS); it models each image through a set of perceptual components, each represented by a structured multivariate Normal distribution. These are trained in a generative, self-supervised manner to assign high likelihood to human-imperceptible augmentations. The final score is a weighted sum of component log-probabilities with weights learned from human perceptual datasets. Unlike feature-based methods, SUSS learns image-specific linear transformations of residuals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Face Recognition and Perception · Generative Adversarial Networks and Image Synthesis
