NoiseFormer -- Noise Diffused Symmetric Attention Transformer
Phani Kumar, Nyshadham, Jyothendra Varma, Polisetty V R K, Aditya Rathore

TL;DR
This paper introduces Noise Diffused Symmetric Attention Transformer, a novel architecture that improves performance and reduces model size by enhancing symmetric attention mechanisms, validated on GPT2 with better accuracy and efficiency.
Contribution
It proposes a unified model architecture that enhances symmetric attention with noise diffusion, achieving better accuracy and size reduction over traditional symmetric attention methods.
Findings
Performance gains between symmetric attention and GPT2 base.
Significant model size reduction.
Improved accuracy on GLUE benchmark tasks.
Abstract
Transformer architecture has been very successful long runner in the field of Deep Learning (DL) and Large Language Models (LLM) because of its powerful attention-based learning and parallel-natured architecture. As the models grow gigantic in terms of memory footprint, difficulties in fitting the model on a device like a GPU or an AI accelerator give rise to the need for multiple computing devices thereby escalating the computing cost. This increased training/inference cost paved the way for efficient model size reduction/parametric reduction deploying Sparse Attention techniques. In this paper, we start analyzing one of the techniques of Sparse Attention called Symmetric Dot-Product Attention (referred to as Symmetric Attention) and propose a novel unified model architecture called Noise Diffused Symmetric Attention Transformer to enhance the model's performance. While maintaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Machine Learning and Data Classification
