Implementation of a framework for deploying AI inference engines in FPGAs
Ryan Herbst, Ryan Coffee, Nathan Fronk, Kukhee Kim, Kuktae Kim, Larry, Ruckman, and J.J. Russell

TL;DR
This paper presents a software framework that enables efficient deployment of machine learning inference engines on FPGAs, optimizing data flow and latency for real-time experimental applications at LCLS2.
Contribution
It introduces a novel FPGA deployment framework with a Keras-like API, supporting full redeployment and quantization, tailored for high-throughput, low-latency scientific data processing.
Findings
Preliminary framework successfully deploys ML models on FPGAs.
Optimized data streaming reduces latency and buffer requirements.
Supports dynamic network reconfiguration without resynthesis.
Abstract
The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline experiments at up to 1Mhz These experimentals will require new ultrahigh rate UHR detectors that can operate at rates above 100 kHz and generate data throughputs upwards of 1 TBs a data velocity which requires prohibitively large investments in storage infrastructure Machine Learning has demonstrated the potential to digest large datasets to extract relevant insights however current implementations show latencies that are too high for realtime data reduction objectives SLAC has endeavored on the creation of a software framework which translates MLs structures for deployment on Field Programmable Gate Arrays FPGAs deployed at the Edge of the data chain close to the instrumentation This framework leverages Xilinxs HLS framework presenting an API modeled after the open source Keras interface to the TensorFlow library…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Particle Detector Development and Performance · Parallel Computing and Optimization Techniques
MethodsLib
