Optimizing edge AI models on HPC systems with the edge in the loop

Marcel Aach; Cyril Blanc; Andreas Lintermann; Kurt De Grave

arXiv:2505.19995·cs.DC·November 26, 2025

Optimizing edge AI models on HPC systems with the edge in the loop

Marcel Aach, Cyril Blanc, Andreas Lintermann, Kurt De Grave

PDF

1 Repo

TL;DR

This paper presents a hardware-aware Neural Architecture Search workflow that couples edge devices with HPC systems to optimize AI models for edge deployment, achieving significant speed and quality improvements in additive manufacturing applications.

Contribution

It introduces a novel NAS workflow integrating real-time latency measurements on target hardware with HPC training, tailored for edge AI model optimization.

Findings

01

8.8x faster inference speed achieved

02

Model quality improved by 1.35 times

03

Validated on additive manufacturing dataset

Abstract

Artificial intelligence and machine learning models deployed on edge devices, e.g., for quality control in Additive Manufacturing (AM), are frequently small in size. Such models usually have to deliver highly accurate results within a short time frame. Methods that are commonly employed in literature start out with larger trained models and try to reduce their memory and latency footprint by structural pruning, knowledge distillation, or quantization. It is, however, also possible to leverage hardware-aware Neural Architecture Search (NAS), an approach that seeks to systematically explore the architecture space to find optimized configurations. In this study, a hardware-aware NAS workflow is introduced that couples an edge device located in Belgium with a powerful High-Performance Computing system in Germany, to train possible architecture candidates as fast as possible while performing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

flanders-make-vzw/hpc2edge
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Attention Model