Hyperparameter Transfer for Dense Associative Memories
Roi Holtzman, Dmitry Krotov, Boris Hanin

TL;DR
This paper develops hyperparameter transfer methods specifically for Dense Associative Memories, enabling effective scaling from small to large models despite unique activation functions and shared weights.
Contribution
It introduces the first hyperparameter transfer techniques tailored for DenseAMs, addressing challenges posed by shared weights and peaking activation functions.
Findings
Hyperparameters tuned on small models effectively transfer to larger DenseAMs.
Theoretical prescriptions align well with empirical results.
The method improves scalability and performance of DenseAMs.
Abstract
Dense Associative Memory (DenseAM) is a promising family of AI architectures that is represented by a neural network performing temporal dynamics on an energy landscape. While hyperparameter transfer methods are well-studied for feed-forward networks, these methods have not been developed for settings in which weights are shared across layers and within the layer, which is common in DenseAMs. Additionally, DenseAMs utilize rapidly peaking activation functions that are rarely used in feed-forward architectures. The confluence of these aspects makes DenseAM a challenging framework for using existing methods for hyperparameter transfer. Our work initiates the development of hyperparameter transfer methods for this class of models. We derive explicit prescriptions for how the hyperparameters tuned on small models can be transferred to models trained at scale. We demonstrate excellent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
