Muppet: MapReduce-Style Processing of Fast Data

Wang Lam; Lu Liu; STS Prasad; Anand Rajaraman; Zoheb Vacheri; AnHai; Doan

arXiv:1208.4175·cs.DB·August 22, 2012·19 cites

Muppet: MapReduce-Style Processing of Fast Data

Wang Lam, Lu Liu, STS Prasad, Anand Rajaraman, Zoheb Vacheri, AnHai, Doan

PDF

Open Access

TL;DR

Muppet is a MapReduce-inspired framework designed specifically for fast data processing, enabling low-latency, scalable applications in social media and e-commerce environments.

Contribution

The paper introduces MapUpdate and Muppet, a novel framework tailored for fast data, addressing limitations of traditional MapReduce in low-latency scenarios.

Findings

01

Muppet effectively processes fast data streams with low latency.

02

Extensive use at Kosmix and WalmartLabs demonstrates practical utility.

03

Framework supports a broad range of social media and e-commerce applications.

Abstract

MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applications and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this question, as carried out at Kosmix and WalmartLabs. We describe MapUpdate, a framework like MapReduce, but specifically developed for fast data. We describe Muppet, our implementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Data Quality and Management