Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning
Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba,, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey,, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko,, Varun Kumar, Yixing Lao, Christopher R. Lishka

TL;DR
Intel nGraph is a versatile intermediate representation and compiler designed to optimize deep learning performance across multiple frameworks and hardware platforms, reducing manual effort and enabling scalable, hardware-agnostic deep learning deployment.
Contribution
The paper introduces Intel nGraph, a novel C++ library that simplifies cross-framework and cross-platform optimization for deep learning workloads, extending support to various hardware and frameworks.
Findings
Supports TensorFlow, MXNet, and neon frameworks.
Initial backends include CPUs, NVIDIA GPUs, and Intel NNP.
Provides compiler optimizations like memory management and data layout abstraction.
Abstract
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires effort; where is the number of frameworks and is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems
