FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless   Inference Services on the Edge

Sifat Ut Taki; Arthi Padmanabhan; Spyridon Mastorakis

arXiv:2410.21120·cs.LG·October 29, 2024

FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge

Sifat Ut Taki, Arthi Padmanabhan, Spyridon Mastorakis

PDF

Open Access 1 Repo

TL;DR

FusedInf introduces a method to combine multiple DNN models into a single DAG, enabling efficient swapping and faster, more memory-efficient inference on resource-constrained edge devices for serverless AI services.

Contribution

It proposes FusedInf, a novel approach that merges models into a DAG to optimize model loading and execution on edge AI hardware for serverless inference.

Findings

01

Up to 14% faster model execution

02

Memory reduction of up to 17%

03

Effective model swapping on edge devices

Abstract

Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sifattaj/fusedinf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · IoT and Edge/Fog Computing · Opportunistic and Delay-Tolerant Networks

Methodstravel james