Sparse Distributed Memory is a Continual Learner

Trenton Bricken; Xander Davies; Deepak Singh; Dmitry Krotov; Gabriel; Kreiman

arXiv:2303.11934·cs.NE·March 28, 2023·5 cites

Sparse Distributed Memory is a Continual Learner

Trenton Bricken, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel, Kreiman

PDF

Open Access 1 Repo

TL;DR

This paper introduces a biologically inspired Sparse Distributed Memory-based MLP that excels at continual learning without memory replay or task info, offering new training methods for sparse networks.

Contribution

The paper presents a novel biologically inspired MLP variant using SDM that achieves effective continual learning without replay or task labels, with broad applicability.

Findings

01

Component-wise necessity for continual learning

02

No reliance on memory replay or task info

03

Novel training methods for sparse networks

Abstract

Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trentbrick/sdmcontinuallearner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding · Residual Connection · Dropout · Layer Normalization · Dense Connections