# FPGA-accelerated machine learning inference as a service for particle   physics computing

**Authors:** Javier Duarte, Philip Harris, Scott Hauck, Burt Holzman, Shih-Chieh, Hsu, Sergo Jindariani, Suffian Khan, Benjamin Kreis, Brian Lee, Mia Liu,, Vladimir Lon\v{c}ar, Jennifer Ngadiuba, Kevin Pedro, Brandon Perez, Maurizio, Pierini, Dylan Rankin, Nhan Tran, Matthew Trahms, Aristeidis Tsaris, Colin, Versteeg, Ted W. Way, Dustin Werran, Zhenbin Wu

arXiv: 1904.08986 · 2019-10-17

## TL;DR

This paper demonstrates that FPGA-accelerated machine learning inference as a service significantly improves latency and throughput for particle physics applications, offering a cost-effective and scalable solution.

## Contribution

It introduces FPGA-based acceleration for machine learning inference in particle physics, achieving substantial latency reduction and high throughput with minimal modifications to existing systems.

## Key findings

- Inference time reduced by up to 175 times compared to CPU
- Achieved 600-700 inferences per second with FPGA service
- Comparable throughput to large batch GPU inference

## Abstract

New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08986/full.md

## Figures

29 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08986/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1904.08986/full.md

---
Source: https://tomesphere.com/paper/1904.08986