Understanding Character Recognition using Visual Explanations Derived from the Human Visual System and Deep Networks
Chetan Ralekar, Shubham Choudhary, Tapan Kumar Gandhi, Santanu, Chaudhury

TL;DR
This study compares human and deep network visual strategies in character recognition using eye-tracking and visualization maps, revealing that aligning model focus with human fixations improves accuracy without extra parameters.
Contribution
The paper introduces a novel supervision method using human eye-tracking data to guide deep networks' focus, enhancing recognition performance and interpretability.
Findings
Deep networks focus on similar regions as humans for correct classifications.
Misaligned focus correlates with misclassification.
Supervising with fixation maps improves model accuracy significantly.
Abstract
Human observers engage in selective information uptake when classifying visual patterns. The same is true of deep neural networks, which currently constitute the best performing artificial vision systems. Our goal is to examine the congruence, or lack thereof, in the information-gathering strategies of the two systems. We have operationalized our investigation as a character recognition task. We have used eye-tracking to assay the spatial distribution of information hotspots for humans via fixation maps and an activation mapping technique for obtaining analogous distributions for deep networks through visualization maps. Qualitative comparison between visualization maps and fixation maps reveals an interesting correlate of congruence. The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters. On the other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection
