Dilated Convolution with Learnable Spacings makes visual models more   aligned with humans: a Grad-CAM study

Rabih Chamas; Ismail Khalfaoui-Hassani; Timothee Masquelier

arXiv:2408.03164·cs.CV·August 7, 2024

Dilated Convolution with Learnable Spacings makes visual models more aligned with humans: a Grad-CAM study

Rabih Chamas, Ismail Khalfaoui-Hassani, Timothee Masquelier

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that Dilated Convolution with Learnable Spacing (DCLS) enhances the interpretability of visual models by aligning their Grad-CAM heatmaps more closely with human visual attention, compared to standard convolutions.

Contribution

The paper shows that replacing standard convolutions with DCLS improves model interpretability and introduces Threshold-Grad-CAM to further enhance heatmap quality across various models.

Findings

01

DCLS increases interpretability scores in most tested models.

02

Threshold-Grad-CAM improves heatmap interpretability for models with poor initial results.

03

DCLS outperforms standard convolutions in aligning model attention with human visual strategies.

Abstract

Dilated Convolution with Learnable Spacing (DCLS) is a recent advanced convolution method that allows enlarging the receptive fields (RF) without increasing the number of parameters, like the dilated convolution, yet without imposing a regular grid. DCLS has been shown to outperform the standard and dilated convolutions on several computer vision benchmarks. Here, we show that, in addition, DCLS increases the models' interpretability, defined as the alignment with human visual strategies. To quantify it, we use the Spearman correlation between the models' GradCAM heatmaps and the ClickMe dataset heatmaps, which reflect human visual attention. We took eight reference models - ResNet50, ConvNeXt (T, S and B), CAFormer, ConvFormer, and FastViT (sa 24 and 36) - and drop-in replaced the standard convolution layers with DCLS ones. This improved the interpretability score in seven of them.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rabihchamas/dcls-gradcam-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage

MethodsConvNeXt · Convolution · Dilated convolution with learnable spacings