TL;DR
This paper introduces GDRAM, a lightweight, reinforcement learning-based image classification model inspired by biological vision, which efficiently processes large images by mimicking retinal stochasticity, outperforming traditional CNNs.
Contribution
The paper presents GDRAM, a novel stochastic retina-inspired neural network that improves real-time large-scale image classification performance over conventional CNNs.
Findings
GDRAM outperforms CNNs on large cluttered datasets.
The model effectively mimics biological visual processes.
GDRAM achieves high accuracy with reduced computational load.
Abstract
Previous studies on image classification have mainly focused on the performance of the networks, not on real-time operation or model compression. We propose a Gaussian Deep Recurrent visual Attention Model (GDRAM)- a reinforcement learning based lightweight deep neural network for large scale image classification that outperforms the conventional CNN (Convolutional Neural Network) which uses the entire image as input. Highly inspired by the biological visual recognition process, our model mimics the stochastic location of the retina with Gaussian distribution. We evaluate the model on Large cluttered MNIST, Large CIFAR-10 and Large CIFAR-100 datasets which are resized to 128 in both width and height.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
