# Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL

**Authors:** Hamza Ezzaoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst

arXiv: 2508.21739 · 2025-09-01

## TL;DR

This paper introduces SNL and Auto-SNL, frameworks for deploying real-time neural network inference on FPGAs, demonstrating competitive latency and resource efficiency compared to existing tools like hls4ml.

## Contribution

The paper presents SNL, a flexible FPGA deployment framework with dynamic weight updates, and Auto-SNL, a Python tool for converting models into FPGA-compatible code, along with a benchmark comparison to hls4ml.

## Key findings

- SNL achieves comparable or better latency than hls4ml.
- SNL can reduce FPGA resource usage in some architectures.
- Auto-SNL simplifies the conversion process from Python models.

## Abstract

The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s. Managing such massive data streams presents significant challenges, as transmission and storage infrastructures become prohibitively expensive. Machine learning (ML) offers a promising solution for real-time data reduction, but conventional implementations introduce excessive latency, making them unsuitable for high-speed experimental environments. To address these challenges, SLAC developed the SLAC Neural Network Library (SNL), a specialized framework designed to deploy real-time ML inference models on Field-Programmable Gate Arrays (FPGA). SNL's key feature is the ability to dynamically update model weights without requiring FPGA resynthesis, enhancing flexibility for adaptive learning applications. To further enhance usability and accessibility, we introduce Auto-SNL, a Python extension that streamlines the process of converting Python-based neural network models into SNL-compatible high-level synthesis code. This paper presents a benchmark comparison against hls4ml, the current state-of-the-art tool, across multiple neural network architectures, fixed-point precisions, and synthesis configurations targeting a Xilinx ZCU102 FPGA. The results showed that SNL achieves competitive or superior latency in most tested architectures, while in some cases also offering FPGA resource savings. This adaptation demonstrates SNL's versatility, opening new opportunities for researchers and academics in fields such as high-energy physics, medical imaging, robotics, and many more.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21739/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21739/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/2508.21739/full.md

---
Source: https://tomesphere.com/paper/2508.21739