CNN with large memory layers
Rasul Karimov, Yury Malkov, Karim Iskakov, Victor Lempitsky

TL;DR
This paper explores a large memory layer architecture with Cartesian product decomposition for neural networks, demonstrating improved speed and accuracy in vision tasks, and introduces a re-initialization technique to address key utilization issues.
Contribution
It introduces a scalable large memory layer with Cartesian product decomposition and a re-initialization method to improve key utilization and model performance.
Findings
Memory layers improve classification and relocalization accuracy.
Re-initialization reduces unused keys and enhances training.
Memory generalizes well across tasks with spatial correlation insights.
Abstract
This work is centred around the recently proposed product key memory structure \cite{large_memory}, implemented for a number of computer vision applications. The memory structure can be regarded as a simple computation primitive suitable to be augmented to nearly all neural network architectures. The memory block allows implementing sparse access to memory with square root complexity scaling with respect to the memory capacity. The latter scaling is possible due to the incorporation of Cartesian product space decomposition of the key space for the nearest neighbour search. We have tested the memory layer on the classification, image reconstruction and relocalization problems and found that for some of those, the memory layers can provide significant speed/accuracy improvement with the high utilization of the key-value elements, while others require more careful fine-tuning and suffer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
