In-Storage Embedded Accelerator for Sparse Pattern Processing

Sang-Woo Jun; Huy T. Nguyen; Vijay N. Gadepally; Arvind

arXiv:1611.03380·cs.AR·January 25, 2017

In-Storage Embedded Accelerator for Sparse Pattern Processing

Sang-Woo Jun, Huy T. Nguyen, Vijay N. Gadepally, Arvind

PDF

TL;DR

This paper introduces a novel in-storage embedded accelerator architecture for efficient sparse pattern processing on large datasets, outperforming traditional software solutions in speed and power efficiency.

Contribution

The paper presents a new in-storage accelerator architecture that significantly improves sparse pattern processing performance and efficiency on large datasets.

Findings

01

Handles up to 1TB data with a single accelerator slice

02

Outperforms C/C++ software on a 16-core system in speed and power

03

Matches 48-core server performance with optimized accelerator

Abstract

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing, bioinformatics, subgraph matching, machine learning, and graph processing. One slice of our prototype accelerator is capable of handling up to 1TB of data, and experiments show that it can outperform C/C++ software solutions on a 16-core system at a fraction of the power and cost; an optimized version of the accelerator can match the performance of a 48-core server.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.