Hidden costs for inference with deep network on embedded system devices

Chankyu Lee; Woohyun Choi; Sangwook Park

arXiv:2601.01698·cs.CC·January 6, 2026

Hidden costs for inference with deep network on embedded system devices

Chankyu Lee, Woohyun Choi, Sangwook Park

PDF

Open Access

TL;DR

This paper investigates the limitations of using Multiply-Accumulate operations as a metric for inference performance of deep learning models on embedded systems, emphasizing the need to consider additional computational costs.

Contribution

It reveals the overlooked computational costs beyond Multiply-Accumulate operations that impact inference time on embedded devices.

Findings

01

Multiply-Accumulate operations alone do not accurately predict inference time.

02

Additional tensor computations significantly affect inference performance.

03

Optimizing models for embedded systems requires considering these extra costs.

Abstract

This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Parallel Computing and Optimization Techniques