B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory
Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao, Shen, Benjamin Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto

TL;DR
B'MOJO introduces a flexible architecture that combines eidetic and fading memory for foundation models, enabling efficient, scalable, and long-sequence inference with improved performance and training speed.
Contribution
The paper presents B'MOJO, a novel model that seamlessly integrates eidetic and fading memory, extending capabilities of existing architectures and enabling scalable, efficient long-sequence processing.
Findings
Outperforms existing SSMs and hybrid models on associative recall tasks.
Achieves language modeling perplexity comparable to Transformers and SSMs up to 1.4B parameters.
Up to 10% faster training compared to similar-sized models.
Abstract
We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span ("context" in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent hybrid architectures have combined eidetic and fading memory, but with limitations that do not allow the designer or the learning process to seamlessly modulate the two, nor to extend the eidetic memory span. We leverage ideas from Stochastic Realization Theory to develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an elementary composable module. The overall architecture can be used to implement models that can access short-term eidetic memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModular Robots and Swarm Intelligence
MethodsTransductive Inference
