Serverless Data Analytics with Flint

Youngbin Kim; Jimmy Lin

arXiv:1803.06354·cs.DC·October 11, 2018·5 cites

Serverless Data Analytics with Flint

Youngbin Kim, Jimmy Lin

PDF

Open Access

TL;DR

This paper introduces Flint, a serverless Spark execution engine leveraging AWS Lambda, enabling cost-effective big data analytics without traditional Spark clusters.

Contribution

It presents the design and implementation of Flint, a novel serverless analytics engine that simplifies big data processing using existing PySpark code on AWS Lambda.

Findings

01

Flint achieves comparable performance to traditional Spark clusters.

02

Flint offers a pay-as-you-go cost model for big data analytics.

03

The system overcomes challenges of serverless execution for data analytics.

Abstract

Serverless architectures organized around loosely-coupled function invocations represent an emerging design for many applications. Recent work mostly focuses on user-facing products and event-driven processing pipelines. In this paper, we explore a completely different part of the application space and examine the feasibility of analytical processing on big data using a serverless architecture. We present Flint, a prototype Spark execution engine that takes advantage of AWS Lambda to provide a pure pay-as-you-go cost model. With Flint, a developer uses PySpark exactly as before, but without needing an actual Spark cluster. We describe the design, implementation, and performance of Flint, along with the challenges associated with serverless analytics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Advanced Data Storage Technologies