L-WISE: Boosting Human Visual Category Learning Through Model-Based Image Selection and Enhancement
Morgan B. Talbot, Gabriel Kreiman, James J. DiCarlo, and Guy Gaziv

TL;DR
This paper introduces a model-based image selection and enhancement method that significantly improves human visual categorization accuracy and training efficiency across various challenging image domains.
Contribution
It presents a novel approach using neural network-derived image perturbations and difficulty estimates to augment human visual learning, achieving substantial accuracy and time savings.
Findings
Categorization accuracy increased by 33-72%.
Training time reduced by 20-23%.
Effective across natural, histology, and dermoscopy images.
Abstract
The currently leading artificial neural network models of the visual ventral stream - which are derived from a combination of performance optimization and robustification methods - have demonstrated a remarkable degree of behavioral alignment with humans on visual categorization tasks. We show that image perturbations generated by these models can enhance the ability of humans to accurately report the ground truth class. Furthermore, we find that the same models can also be used out-of-the-box to predict the proportion of correct human responses to individual images, providing a simple, human-aligned estimator of the relative difficulty of each image. Motivated by these observations, we propose to augment visual learning in humans in a way that improves human categorization accuracy at test time. Our learning augmentation approach consists of (i) selecting images based on their…
Peer Reviews
Decision·ICLR 2025 Poster
- Presents an innovative use of robustified ANNs to predict task difficulty and enhance images, aiding human perception and learning. - Shows broad applicability by successfully testing across diverse domains, such as natural image classification, dermoscopy, and histology. - Achieves practical efficiency by reducing training time and improving test-time accuracy, beneficial for fields requiring rapid, accurate human image categorization training.
- Lacks a dedicated related work section, which would help contextualize the research. - Both low and high logits from ANNs show significant variation in human accuracy, making predictions less reliable in certain logit intervals. - Uses only the ResNet-50 architecture, limiting generalization; further testing with models like vision transformers (ViT) is needed to support broader conclusions. - Image enhancement may introduce biases, potentially improving accuracy only for certain major classes
1. The application of Robustified ANNs for improving human visual performance on image categorization seems like an interesting avenue. 2. L-Wise empirically demonstrate gains in categorization accuracy and training efficiency. 3. The paper addresses ethics concerns. Since the work mentions use of clinical data ethics discussion is of critical importance.
1. The paper focused on the performance of ventral stream. But we know that the human visual stream has a dorsal stream (where) that locates an object and the ventral stream (what) stream. And the interplay of these two streams forms the basis of human visual system. In this work the authors mainly focused on the ventral stream. From only quantified data, we can see the gains but it is very hard to trace this back to the nuanced perturbations the ANN produces. Hence, the suggestion is to use hum
1. This paper is well-motivated, and a decent amount of technical details are given. 2. The idea of improving the categorization performance of the novice learner by leveraging the capacity of the robustified artificial neural network is both interesting and practical. 3. The reported improvement in novice learners' performance is notable, with gains in both test accuracy and reduced training time.
1. The establishment of the empirical observations is somewhat unconvincing. Do these observations hold in more complex classification tasks or when applied to medical imaging? 2. The related work section lacks discussion of both the machine teaching and human-machine vision alignment methods. 3. The size of the particants is somewhat small. 4. The perception of enhanced images may be altered due to perturbations.
Code & Models
Videos
Taxonomy
TopicsAI in cancer detection · Medical Image Segmentation Techniques · Image Retrieval and Classification Techniques
