On Accelerating Edge AI: Optimizing Resource-Constrained Environments

Jacob Sander; Achraf Cohen; Venkat R. Dasari; Brent Venable; Brian; Jalaian

arXiv:2501.15014·cs.LG·January 30, 2025·2 cites

On Accelerating Edge AI: Optimizing Resource-Constrained Environments

Jacob Sander, Achraf Cohen, Venkat R. Dasari, Brent Venable, Brian, Jalaian

PDF

Open Access

TL;DR

This survey reviews strategies for optimizing deep learning models in resource-limited edge environments, focusing on model compression, neural architecture search, and deployment frameworks to enhance performance and efficiency.

Contribution

It provides a comprehensive overview of current techniques and emerging trends for accelerating edge AI, highlighting integration methods and open challenges in the field.

Findings

01

Model compression techniques significantly reduce model size and inference latency.

02

Neural Architecture Search automates the discovery of hardware-efficient models.

03

Deployment frameworks enable hardware-specific optimizations for edge devices.

Abstract

Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. In this survey, we present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints. First, we examine model compression techniques-pruning, quantization, tensor decomposition, and knowledge distillation-that streamline large models into smaller, faster, and more efficient variants. Next, we explore Neural Architecture Search (NAS), a class of automated methods that discover architectures inherently optimized for particular tasks and hardware budgets. We then discuss compiler and deployment frameworks, such as TVM, TensorRT, and OpenVINO, which provide hardware-tailored optimizations at inference time. By integrating these three pillars into unified pipelines, practitioners can achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing

MethodsPruning