Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
Amine Barrak, Emna Ksontini

TL;DR
This paper demonstrates that serverless parallel processing significantly accelerates large-scale ML inference tasks like sentiment analysis, reducing execution time by over 95% while maintaining cost efficiency.
Contribution
It introduces a scalable, cost-effective approach for ML inference using serverless functions to decompose and parallelize monolithic processes.
Findings
Over 95% reduction in execution time compared to monolithic methods
Cost remains comparable to traditional approaches
Effective for large-scale sentiment analysis tasks
Abstract
As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Dropout · Weight Decay · WordPiece · Layer Normalization · Residual Connection · Linear Layer
