Homeostasis and Sparsity in Transformer

Leonid Kotyuzanskiy; Artem Klimov

arXiv:2412.00503·cs.LG·December 17, 2024

Homeostasis and Sparsity in Transformer

Leonid Kotyuzanskiy, Artem Klimov

PDF

Open Access

TL;DR

This paper introduces homeostasis-inspired sparsity mechanisms into transformer architectures, improving their performance on language translation tasks by dynamically adjusting activation distributions.

Contribution

It proposes novel homeostasis and sparsity techniques, such as RFB-kWTA and Smart Inhibition, integrated into transformers to enhance their efficiency and accuracy.

Findings

01

Achieved 0.3062 BLEU score on Multi30K dataset.

02

Outperformed classical transformer and dropout-only models.

03

Demonstrated effectiveness of homeostasis mechanisms in transformers.

Abstract

The transformer architecture has become an integral part of the field of modern neural networks, playing a crucial role in a variety of tasks, such as text generation, machine translation, image and audio processing, among others. There is also an alternative approach to building intelligent systems, proposed by Jeff Hawkins and inspired by the processes occurring in the neocortex. In our article we want to combine some of these ideas and to propose the use of homeostasis mechanisms, such as RFB-kWTA and "Smart" Inhibition, in the attention mechanism of the transformer and at the output of the transformer block, as well as conducting an experiment involving the introduction of sparse distributed representations of the transformer at various points. RFB-kWTA utilizes statistics of layer activations across time to adjust the entire layer, enhancing the values of rare activations while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need · Dropout