Reliable edge machine learning hardware for scientific applications

Tommaso Baldi (1; 2); Javier Campos (1); Ben Hawks (1); Jennifer; Ngadiuba (1); Nhan Tran (1); Daniel Diaz (3); Javier Duarte (3); Ryan Kastner; (3); Andres Meza (3); Melissa Quinnan (3); Olivia Weng (3); Caleb Geniesse; (4); Amir Gholami (4); Michael W. Mahoney (4); Vladimir Loncar (5); Philip; Harris (5); Joshua Agar (6); Shuyu Qin (6) ((1) Fermilab; (2) University of; Pisa; (3) UC San Diego; (4) UC Berkeley/LBNL/ICSI; (5) MIT; (6) Drexel; University)

arXiv:2406.19522·cs.LG·July 1, 2024

Reliable edge machine learning hardware for scientific applications

Tommaso Baldi (1, 2), Javier Campos (1), Ben Hawks (1), Jennifer, Ngadiuba (1), Nhan Tran (1), Daniel Diaz (3), Javier Duarte (3), Ryan Kastner, (3), Andres Meza (3), Melissa Quinnan (3), Olivia Weng (3), Caleb Geniesse, (4), Amir Gholami (4), Michael W. Mahoney (4)

PDF

Open Access

TL;DR

This paper discusses the development and validation of reliable edge machine learning hardware tailored for scientific experiments that generate massive data, focusing on robustness, efficiency, and fault tolerance under strict resource constraints.

Contribution

It introduces approaches for validating and developing robust ML algorithms on edge hardware in extreme scientific environments, addressing validation, quantization, pruning, and fault tolerance.

Findings

01

Preliminary results on robust algorithm metrics

02

Strategies for ultra-fine-grained model inspection

03

Outlook on future research directions

Abstract

Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling ultra-fine-grained model inspection for efficient fault tolerance. We discuss approaches to developing and validating reliable algorithms at the scientific edge under such strict latency, resource, power, and area requirements in extreme experimental environments. We study metrics for developing robust algorithms, present preliminary results and mitigation strategies, and conclude with an outlook of these and future directions of research towards the longer-term goal of developing autonomous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications