ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration   of Learning

Hajar Falahati; Pejman Lotfi-Kamran; Mohammad Sadrosadati and; Hamid Sarbazi-Azad

arXiv:1812.11473·cs.LG·January 10, 2019·5 cites

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

Hajar Falahati, Pejman Lotfi-Kamran, Mohammad Sadrosadati and, Hamid Sarbazi-Azad

PDF

Open Access

TL;DR

ORIGAMI introduces a heterogeneous in-memory acceleration architecture supporting diverse ML algorithms by combining pattern matching, specialized compute engines, and split execution with external platforms, achieving significant performance and energy efficiency improvements.

Contribution

It presents a novel heterogeneous in-memory accelerator design with pattern matching and computation splitting, enabling efficient support for various ML algorithms within strict power and area constraints.

Findings

01

Outperforms state-of-the-art in-memory accelerators by up to 1.6x in performance.

02

Reduces energy-delay product by up to 31x.

03

Achieves results within 1% of an ideal unlimited-resource system.

Abstract

Memory bandwidth bottleneck is a major challenges in processing machine learning (ML) algorithms. In-memory acceleration has potential to address this problem; however, it needs to address two challenges. First, in-memory accelerator should be general enough to support a large set of different ML algorithms. Second, it should be efficient enough to utilize bandwidth while meeting limited power and area budgets of logic layer of a 3D-stacked memory. We observe that previous work fails to simultaneously address both challenges. We propose ORIGAMI, a heterogeneous set of in-memory accelerators, to support compute demands of different ML algorithms, and also uses an off-the-shelf compute platform (e.g.,FPGA,GPU,TPU,etc.) to utilize bandwidth without violating strict area and power budgets. ORIGAMI offers a pattern-matching technique to identify similar computation patterns of ML algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Machine Learning and Algorithms