Understanding Character Recognition using Visual Explanations Derived   from the Human Visual System and Deep Networks

Chetan Ralekar; Shubham Choudhary; Tapan Kumar Gandhi; Santanu; Chaudhury

arXiv:2108.04558·cs.CV·August 31, 2021·1 cites

Understanding Character Recognition using Visual Explanations Derived from the Human Visual System and Deep Networks

Chetan Ralekar, Shubham Choudhary, Tapan Kumar Gandhi, Santanu, Chaudhury

PDF

Open Access

TL;DR

This study compares human and deep network visual strategies in character recognition using eye-tracking and visualization maps, revealing that aligning model focus with human fixations improves accuracy without extra parameters.

Contribution

The paper introduces a novel supervision method using human eye-tracking data to guide deep networks' focus, enhancing recognition performance and interpretability.

Findings

01

Deep networks focus on similar regions as humans for correct classifications.

02

Misaligned focus correlates with misclassification.

03

Supervising with fixation maps improves model accuracy significantly.

Abstract

Human observers engage in selective information uptake when classifying visual patterns. The same is true of deep neural networks, which currently constitute the best performing artificial vision systems. Our goal is to examine the congruence, or lack thereof, in the information-gathering strategies of the two systems. We have operationalized our investigation as a character recognition task. We have used eye-tracking to assay the spatial distribution of information hotspots for humans via fixation maps and an activation mapping technique for obtaining analogous distributions for deep networks through visualization maps. Qualitative comparison between visualization maps and fixation maps reveals an interesting correlate of congruence. The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters. On the other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection