Implementation of a framework for deploying AI inference engines in   FPGAs

Ryan Herbst; Ryan Coffee; Nathan Fronk; Kukhee Kim; Kuktae Kim; Larry; Ruckman; and J.J. Russell

arXiv:2305.19455·physics.ins-det·June 1, 2023·1 cites

Implementation of a framework for deploying AI inference engines in FPGAs

Ryan Herbst, Ryan Coffee, Nathan Fronk, Kukhee Kim, Kuktae Kim, Larry, Ruckman, and J.J. Russell

PDF

Open Access

TL;DR

This paper presents a software framework that enables efficient deployment of machine learning inference engines on FPGAs, optimizing data flow and latency for real-time experimental applications at LCLS2.

Contribution

It introduces a novel FPGA deployment framework with a Keras-like API, supporting full redeployment and quantization, tailored for high-throughput, low-latency scientific data processing.

Findings

01

Preliminary framework successfully deploys ML models on FPGAs.

02

Optimized data streaming reduces latency and buffer requirements.

03

Supports dynamic network reconfiguration without resynthesis.

Abstract

The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline experiments at up to 1Mhz These experimentals will require new ultrahigh rate UHR detectors that can operate at rates above 100 kHz and generate data throughputs upwards of 1 TBs a data velocity which requires prohibitively large investments in storage infrastructure Machine Learning has demonstrated the potential to digest large datasets to extract relevant insights however current implementations show latencies that are too high for realtime data reduction objectives SLAC has endeavored on the creation of a software framework which translates MLs structures for deployment on Field Programmable Gate Arrays FPGAs deployed at the Edge of the data chain close to the instrumentation This framework leverages Xilinxs HLS framework presenting an API modeled after the open source Keras interface to the TensorFlow library…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Particle Detector Development and Performance · Parallel Computing and Optimization Techniques

MethodsLib