Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Hendrik Borras; Giuseppe Di Guglielmo; Javier Duarte and; Nicol\`o Ghielmetti; Ben Hawks; Scott Hauck; Shih-Chieh Hsu; Ryan; Kastner; Jason Liang; Andres Meza; Jules Muhizi; Tai Nguyen and; Rushil Roy; Nhan Tran; Yaman Umuroglu; Olivia Weng; Aidan Yokuda; and Michaela Blott

arXiv:2206.11791·cs.LG·June 24, 2022·5 cites

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Hendrik Borras, Giuseppe Di Guglielmo, Javier Duarte and, Nicol\`o Ghielmetti, Ben Hawks, Scott Hauck, Shih-Chieh Hsu, Ryan, Kastner, Jason Liang, Andres Meza, Jules Muhizi, Tai Nguyen and, Rushil Roy, Nhan Tran, Yaman Umuroglu, Olivia Weng, Aidan Yokuda, and Michaela Blott

PDF

Open Access 1 Repo

TL;DR

This paper details the development and implementation of open-source FPGA-based neural network solutions for MLPerf Tiny benchmarks, emphasizing democratization, optimization, and performance on various FPGA platforms.

Contribution

It introduces a comprehensive FPGA design workflow using open-source tools for MLPerf Tiny tasks, with new optimizations and adaptable architectures for improved speed and efficiency.

Findings

01

Achieved inference latencies as low as 20 microseconds.

02

Energy consumption as low as 30 microjoules per inference.

03

Deployed solutions on Pynq-Z2 and Arty A7-100T FPGA platforms.

Abstract

We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $μ$ s and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlcommons/tiny_results_v0.7/tree/main/open/hls4ml-finn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Neural Networks and Applications · Image Processing Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings