Intel nGraph: An Intermediate Representation, Compiler, and Executor for   Deep Learning

Scott Cyphers; Arjun K. Bansal; Anahita Bhiwandiwalla; Jayaram Bobba,; Matthew Brookhart; Avijit Chakraborty; Will Constable; Christian Convey,; Leona Cook; Omar Kanawi; Robert Kimball; Jason Knight; Nikolay Korovaiko,; Varun Kumar; Yixing Lao; Christopher R. Lishka; Jaikrishnan Menon; Jennifer; Myers; Sandeep Aswath Narayana; Adam Procter; Tristan J. Webb

arXiv:1801.08058·cs.DC·January 31, 2018·105 cites

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba,, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey,, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko,, Varun Kumar, Yixing Lao, Christopher R. Lishka

PDF

Open Access 1 Repo

TL;DR

Intel nGraph is a versatile intermediate representation and compiler designed to optimize deep learning performance across multiple frameworks and hardware platforms, reducing manual effort and enabling scalable, hardware-agnostic deep learning deployment.

Contribution

The paper introduces Intel nGraph, a novel C++ library that simplifies cross-framework and cross-platform optimization for deep learning workloads, extending support to various hardware and frameworks.

Findings

01

Supports TensorFlow, MXNet, and neon frameworks.

02

Initial backends include CPUs, NVIDIA GPUs, and Intel NNP.

03

Provides compiler optimizations like memory management and data layout abstraction.

Abstract

The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $O (f p)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NervanaSystems/ngraph-python
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems