CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework
Xiaofeng Li, Bin Ren, Xipeng Shen, Yanzhi Wang

TL;DR
XGen is a comprehensive optimizing framework that enhances DNN performance on edge devices through cross-layer, cooperative optimizations, enabling faster inference without accuracy loss.
Contribution
It introduces a full-stack, AI-oriented optimization approach that effectively bridges the gap between DNN demands and edge device capabilities, supporting complex models like transformers.
Findings
XGen achieves several times faster DNN inference.
Supports optimization of deep and transformer-based models.
Maintains accuracy while significantly improving speed.
Abstract
There is a growing demand for shifting the delivery of AI capability from data centers on the cloud to edge or end devices, exemplified by the fast emerging real-time AI-based apps running on smartphones, AR/VR devices, autonomous vehicles, and various IoT devices. The shift has however been seriously hampered by the large growing gap between DNN computing demands and the computing power on edge or end devices. This article presents the design of XGen, an optimizing framework for DNN designed to bridge the gap. XGen takes cross-cutting co-design as its first-order consideration. Its full-stack AI-oriented optimizations consist of a number of innovative optimizations at every layer of the DNN software stack, all designed in a cooperative manner. The unique technology makes XGen able to optimize various DNNs, including those with an extreme depth (e.g., BERT, GPT, other transformers), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Advanced Memory and Neural Computing
MethodsAttention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Cosine Annealing · Weight Decay · Discriminative Fine-Tuning · Residual Connection · Layer Normalization · Adam · WordPiece
