B'MOJO: Hybrid State Space Realizations of Foundation Models with   Eidetic and Fading Memory

Luca Zancato; Arjun Seshadri; Yonatan Dukler; Aditya Golatkar; Yantao; Shen; Benjamin Bowman; Matthew Trager; Alessandro Achille; Stefano Soatto

arXiv:2407.06324·cs.LG·July 10, 2024·1 cites

B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory

Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao, Shen, Benjamin Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto

PDF

Open Access 1 Video

TL;DR

B'MOJO introduces a flexible architecture that combines eidetic and fading memory for foundation models, enabling efficient, scalable, and long-sequence inference with improved performance and training speed.

Contribution

The paper presents B'MOJO, a novel model that seamlessly integrates eidetic and fading memory, extending capabilities of existing architectures and enabling scalable, efficient long-sequence processing.

Findings

01

Outperforms existing SSMs and hybrid models on associative recall tasks.

02

Achieves language modeling perplexity comparable to Transformers and SSMs up to 1.4B parameters.

03

Up to 10% faster training compared to similar-sized models.

Abstract

We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span ("context" in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent hybrid architectures have combined eidetic and fading memory, but with limitations that do not allow the designer or the learning process to seamlessly modulate the two, nor to extend the eidetic memory span. We leverage ideas from Stochastic Realization Theory to develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an elementary composable module. The overall architecture can be used to implement models that can access short-term eidetic memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory· slideslive

Taxonomy

TopicsModular Robots and Swarm Intelligence

MethodsTransductive Inference