TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs   via Bidirectional Communication

Zongwu Wang; Fangxin Liu; Mingshuai Li; Li Jiang

arXiv:2412.20501·cs.DC·December 31, 2024

TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication

Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang

PDF

Open Access 1 Repo

TL;DR

TokenRing introduces a bidirectional communication framework that significantly improves the scalability and efficiency of parallelizing long-context LLMs, reducing communication overhead and enhancing throughput.

Contribution

It proposes a novel fine-grained parallel framework leveraging bidirectional P2P communication and concurrent data transmission to optimize distributed Transformer performance.

Findings

01

Reduces communication overhead in long-sequence LLMs

02

Improves throughput and scalability of distributed Transformers

03

Enhances compatibility with various multi-GPU interconnects

Abstract

Efficient parallelization of Large Language Models (LLMs) with long sequences is essential but challenging due to their significant computational and memory demands, particularly stemming from communication bottlenecks in attention mechanisms. While sequence parallelism (SP) has been introduced as a potential solution, existing methods often suffer from limited scalability or inefficiency, rendering their effectiveness. Ring-Attention demonstrates the potential for scaling sequence processing but faces significant limitations due to its reliance on peer-to-peer (P2P) communication and inefficient utilization of network resources. As the degree of SP increases, the quadratic decrease in computation time per step contrasts sharply with the linear reduction in communication volume, exacerbating communication bottlenecks. To address these challenges, we propose TokenRing, a fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aca-lab-sjtu/token-ring
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security