Superposition in Transformers: A Novel Way of Building Mixture of Experts
Ayoub Ben Chaliah, Hela Dellagi

TL;DR
This paper introduces Superposition in Transformers, a new architecture that uses autoencoders and blending techniques to mitigate catastrophic forgetting, enabling models to retain original knowledge while integrating domain-specific expertise within a shared parameter space.
Contribution
It proposes a novel superposition method leveraging autoencoders and B-spline blending to preserve model capabilities and add domain expertise without overwriting existing knowledge.
Findings
Effective mitigation of catastrophic forgetting.
Supports dynamic switching between model states.
Preserves original model performance while adding expertise.
Abstract
Catastrophic forgetting remains a major challenge when adapting large language models (LLMs) to new tasks or domains. Conventional fine-tuning often overwrites existing knowledge, causing performance degradation on original tasks. We introduce Superposition in Transformers, a novel architecture that leverages autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. By using B-spline-based blending coefficients and autoencoders that adaptively reconstruct hidden states based on the input data distribution, our method effectively mitigates catastrophic forgetting and enables a new paradigm of "in-model" superposition. This approach preserves original model capabilities while allowing compact domain-specific expertise to be added, and it supports dynamic switching between model states during inference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Big Data and Business Intelligence · Complex Systems and Decision Making
MethodsBalanced Selection
