Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

Tom Devynck; Bilal Faye; Djamel Bouchaffra; Nadjib Lazaar; Hanane Azzag; Mustapha Lebbah

arXiv:2604.06893·cs.CV·April 15, 2026

Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

Tom Devynck, Bilal Faye, Djamel Bouchaffra, Nadjib Lazaar, Hanane Azzag, Mustapha Lebbah

PDF

TL;DR

Energy-Regularized Spatial Masking (ERSM) introduces a differentiable energy-based feature selection method that enhances robustness and interpretability in vision models by autonomously discovering optimal spatial masks.

Contribution

ERSM reformulates feature selection as a differentiable energy minimization, enabling autonomous discovery of spatial masks that improve robustness and interpretability without fixed sparsity constraints.

Findings

01

ERSM produces emergent sparsity and interpretable masks.

02

ERSM improves robustness to structured occlusion.

03

Energy ranking outperforms magnitude-based pruning in robustness tests.

Abstract

Deep convolutional neural networks achieve remarkable performance by exhaustively processing dense spatial feature maps, yet this brute-force strategy introduces significant computational redundancy and encourages reliance on spurious background correlations. As a result, modern vision models remain brittle and difficult to interpret. We propose Energy-Regularized Spatial Masking (ERSM), a novel framework that reformulates feature selection as a differentiable energy minimization problem. By embedding a lightweight Energy-Mask Layer inside standard convolutional backbones, each visual token is assigned a scalar energy composed of two competing forces: an intrinsic Unary importance cost and a Pairwise spatial coherence penalty. Unlike prior pruning methods that enforce rigid sparsity budgets or rely on heuristic importance scores, ERSM allows the network to autonomously discover an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.