In-Storage Embedded Accelerator for Sparse Pattern Processing
Sang-Woo Jun, Huy T. Nguyen, Vijay N. Gadepally, Arvind

TL;DR
This paper introduces a novel in-storage embedded accelerator architecture for efficient sparse pattern processing on large datasets, outperforming traditional software solutions in speed and power efficiency.
Contribution
The paper presents a new in-storage accelerator architecture that significantly improves sparse pattern processing performance and efficiency on large datasets.
Findings
Handles up to 1TB data with a single accelerator slice
Outperforms C/C++ software on a 16-core system in speed and power
Matches 48-core server performance with optimized accelerator
Abstract
We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing, bioinformatics, subgraph matching, machine learning, and graph processing. One slice of our prototype accelerator is capable of handling up to 1TB of data, and experiments show that it can outperform C/C++ software solutions on a 16-core system at a fraction of the power and cost; an optimized version of the accelerator can match the performance of a 48-core server.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
