Towards smaller, faster decoder-only transformers: Architectural   variants and their implications

Sathya Krishnan Suresh; Shunmugapriya P

arXiv:2404.14462·cs.LG·October 10, 2024·1 cites

Towards smaller, faster decoder-only transformers: Architectural variants and their implications

Sathya Krishnan Suresh, Shunmugapriya P

PDF

Open Access 1 Repo

TL;DR

This paper introduces three architectural variants of decoder-only transformers—ParallelGPT, LinearGPT, and ConvGPT—that achieve similar performance to traditional models while being smaller and faster to train, with open-source code provided.

Contribution

The study proposes three novel transformer architectures that reduce model size and training time without sacrificing language generation quality.

Findings

01

Comparable performance to standard transformers in language tasks

02

Reduced model sizes and faster training times

03

Open-source implementation available

Abstract

In recent times, the research on Large Language Models (LLMs) has grown exponentially, predominantly focusing on models underpinned by the transformer architecture, as established by [1], and further developed through the decoder-only variations by [2]. Contemporary efforts in this field primarily aim to enhance model capabilities by scaling up both the architecture and data volumes utilized during training. However, the exploration into reduce these model sizes while preserving their efficacy remains scant. In this study, we introduce three modifications to the decoder-only transformer architecture, namely ParallelGPT (pgpt), LinearGPT (lgpt), and ConvGPT (cgpt). These variants demonstrate comparable performance to the conventional architecture in language generation, yet benefit from reduced model sizes and faster training processes. We open-source the model weights and the complete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SkAndMl/gpt-variations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Advanced Data Storage Technologies · VLSI and FPGA Design Techniques