MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

Enrico Russo; Mohamed Amine Hamdi; Alessandro Ottaviano; Francesco Conti; Angelo Garofalo; Daniele Jahier Pagliari; Maurizio Palesi; Luca Benini; Alessio Burrello

arXiv:2604.09124·cs.DC·April 13, 2026

MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

Enrico Russo, Mohamed Amine Hamdi, Alessandro Ottaviano, Francesco Conti, Angelo Garofalo, Daniele Jahier Pagliari, Maurizio Palesi, Luca Benini, Alessio Burrello

PDF

TL;DR

MATCHA is a novel deployment framework that optimizes deep neural network execution on multi-accelerator heterogeneous edge SoCs, significantly improving utilization and reducing latency.

Contribution

It introduces a unified scheduling approach using constraint programming and pattern matching to fully exploit hardware heterogeneity.

Findings

01

Achieves up to 35% reduction in inference latency on MLPerf Tiny benchmark.

02

Improves accelerator utilization compared to previous MATCH compiler.

03

Effectively manages memory and scheduling for heterogeneous accelerators.

Abstract

Deploying DNNs on System-on-Chips (SoC) with multiple heterogeneous acceleration engines is challenging, and the majority of deployment frameworks cannot fully exploit heterogeneity. We present MATCHA, a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators and uses constraint programming to optimize L3/L2 memory allocation and scheduling. Using pattern matching, tiling, and mapping across individual HW units enables parallel execution and high accelerator utilization. On the MLPerf Tiny benchmark, using a SoC with two heterogeneous accelerators, MATCHA improves accelerator utilization and reduces inference latency by up to 35% with respect to the the state-of-the-art MATCH compiler.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.