Architectural Implications of Neural Network Inference for High   Data-Rate, Low-Latency Scientific Applications

Olivia Weng; Alexander Redding; Nhan Tran; Javier Mauricio Duarte,; Ryan Kastner

arXiv:2403.08980·cs.LG·March 15, 2024·1 cites

Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

Olivia Weng, Alexander Redding, Nhan Tran, Javier Mauricio Duarte,, Ryan Kastner

PDF

Open Access

TL;DR

This paper discusses the architectural challenges and solutions for implementing neural network inference in high data-rate, low-latency scientific applications, emphasizing on-chip storage and custom hardware design.

Contribution

It highlights the necessity of fully on-chip neural network inference hardware for scientific applications with extreme throughput and latency demands, proposing architectural considerations.

Findings

01

Neural networks must be stored entirely on-chip in these applications.

02

Custom or reconfigurable hardware is often required to meet latency and bandwidth constraints.

03

Many scientific NN applications require dedicated, fully on-chip inference solutions.

Abstract

With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as the data is being produced (e.g., every 25 ns). As such, these extreme latency and bandwidth requirements have architectural implications for the hardware intended to run these NNs: 1) all NN parameters must fit on-chip, and 2) codesigning custom/reconfigurable logic is often required to meet these latency and bandwidth constraints. In our work, we show that many scientific NN applications must run fully on chip, in the extreme case requiring a custom chip to meet such stringent constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors · Advanced Neural Network Applications