FengHuang: Next-Generation Memory Orchestration for AI Inferencing
Jiamin Li, Lei Qu, Tao Zhang, Grigory Chirkov, Shuotao Xu, Peng Cheng, Lidong Zhou

TL;DR
FengHuang introduces a scalable, disaggregated AI infrastructure platform that significantly improves memory utilization, communication speed, and cost-efficiency for large language model inference workloads.
Contribution
The paper proposes the FengHuang platform, a novel multi-tier shared-memory architecture with active tensor paging and near-memory compute, addressing scalability limitations of traditional GPU-centric AI inference systems.
Findings
Achieves up to 93% local memory capacity reduction
Enables 50% GPU compute savings
Provides 16x to 70x faster inter-GPU communication
Abstract
This document presents a vision for a novel AI infrastructure design that has been initially validated through inference simulations on state-of-the-art large language models. Advancements in deep learning and specialized hardware have driven the rapid growth of large language models (LLMs) and generative AI systems. However, traditional GPU-centric architectures face scalability challenges for inference workloads due to limitations in memory capacity, bandwidth, and interconnect scaling. To address these issues, the FengHuang Platform, a disaggregated AI infrastructure platform, is proposed to overcome memory and communication scaling limits for AI inference. FengHuang features a multi-tier shared-memory architecture combining high-speed local memory with centralized disaggregated remote memory, enhanced by active tensor paging and near-memory compute for tensor operations. Simulations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
