TL;DR
fabric-lib provides a hardware-agnostic, high-throughput point-to-point communication library for large language model systems, enabling flexible, portable, and efficient data transfer across diverse NICs.
Contribution
It introduces fabric-lib, a novel abstraction layer that unifies NIC functionality for LLM systems, improving portability and performance.
Findings
Achieves 400 Gbps peak throughput on NVIDIA ConnectX-7 and AWS EFA.
Enables disaggregated inference with dynamic scaling in production systems.
Reduces latency for trillion-parameter RL weight updates to 1.3 seconds.
Abstract
Emerging Large Language Model (LLM) system patterns, such as disaggregated inference, Mixture-of-Experts (MoE) routing, and asynchronous reinforcement fine-tuning, require flexible point-to-point communication beyond simple collectives. Existing implementations are locked to specific Network Interface Controllers (NICs), hindering integration into inference engines and portability across hardware providers. We present fabric-lib, which bridges the functionality of common NICs to expose a uniform interface. fabric-lib exposes one-sided WriteImm operations with a ImmCounter primitive for completion notification, without ordering assumptions of network transport, transparently managing multiple NICs per GPU. We demonstrate peak throughput of 400 Gbps on both NVIDIA ConnectX-7 and AWS Elastic Fabric Adapter (EFA). We showcase fabric-lib through three production systems: (1) KvCache transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
