Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference
Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin, Sharma, Zachary Tatlock, Yida Wang

TL;DR
Nimble is a system designed to efficiently compile and execute dynamic neural networks by handling model dynamism through a specialized type system and runtime, outperforming existing frameworks significantly.
Contribution
The paper introduces Nimble, a novel system that enables high-performance compilation and execution of dynamic neural networks across multiple platforms.
Findings
Nimble outperforms state-of-the-art frameworks by up to 20x.
Handles dynamic control flow and tensor shapes effectively.
Supports multiple hardware platforms including CPUs and GPUs.
Abstract
Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Parallel Computing and Optimization Techniques
