Perceptual misalignment of texture representations in convolutional neural networks
Ludovica de Paolis, Fabio Anselmi, Alessio Ansuini, Eugenio Piasini

TL;DR
This study investigates whether CNN-based texture representations align with human perception and finds no correlation between CNN quality and perceptual alignment, suggesting different underlying mechanisms.
Contribution
It quantifies the perceptual content of CNN feature correlations and compares it to mammalian visual system models, revealing a disconnect.
Findings
No link between CNN quality and texture perception alignment.
CNNs do not inherently capture human-like texture perception.
Texture perception may involve mechanisms beyond CNN models.
Abstract
Mathematical modeling of visual textures traces back to Julesz's intuition that texture perception in humans is based on local correlations between image features. An influential approach for texture analysis and generation generalizes this notion to linear correlations between the nonlinear features computed by convolutional neural networks (CNNs), compiled into Gram matrices. Given that CNNs are often used as models for the visual system, it is natural to ask whether such "texture representations" spontaneously align with the textures' perceptual content, and in particular whether those CNNs that are regarded as better models for the visual system also possess more human-like texture representations. Here we quantify the perceptual content captured by feature correlations computed for a diverse pool of CNNs, and we compare it to the models' perceptual alignment with the mammalian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
