Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with   Serverless Functions

Amine Barrak; Emna Ksontini

arXiv:2502.12017·cs.DC·February 18, 2025

Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions

Amine Barrak, Emna Ksontini

PDF

Open Access

TL;DR

This paper demonstrates that serverless parallel processing significantly accelerates large-scale ML inference tasks like sentiment analysis, reducing execution time by over 95% while maintaining cost efficiency.

Contribution

It introduces a scalable, cost-effective approach for ML inference using serverless functions to decompose and parallelize monolithic processes.

Findings

01

Over 95% reduction in execution time compared to monolithic methods

02

Cost remains comparable to traditional approaches

03

Effective for large-scale sentiment analysis tasks

Abstract

As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Dropout · Weight Decay · WordPiece · Layer Normalization · Residual Connection · Linear Layer