Comparative Study of Large Language Model Architectures on Frontier

Junqi Yin; Avishek Bose; Guojing Cong; Isaac Lyngaas; Quentin Anthony

arXiv:2402.00691·cs.DC·February 2, 2024·1 cites

Comparative Study of Large Language Model Architectures on Frontier

Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas, Quentin Anthony

PDF

Open Access 1 Repo 1 Models

TL;DR

This study compares GPT-NeoX and LLaMA architectures trained on the same materials science data using Frontier supercomputer, revealing insights into their performance, efficiency, and guiding future LLM development on HPC systems.

Contribution

It provides a controlled comparative analysis of two open-source GPT models on HPC, including performance, efficiency, and a new architecture design method.

Findings

01

Achieved state-of-the-art results on materials science benchmark.

02

Compared computational and energy efficiency of models.

03

Proposed a new efficient architecture design method.

Abstract

Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eleutherai/gpt-neox
pytorchOfficial

Models

🤗
akswelh/NEOX
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling