Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster   and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm

Nandan Mirajkar; Sandeep Bhujbal; Aaradhana Deshmukh

arXiv:1307.1517·cs.DC·July 8, 2013·5 cites

Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm

Nandan Mirajkar, Sandeep Bhujbal, Aaradhana Deshmukh

PDF

Open Access

TL;DR

This paper demonstrates executing a word count Map-Reduce job on a single-node Hadoop cluster and compressing data with the LZO algorithm to optimize storage and processing of large datasets.

Contribution

It presents a practical implementation of Map-Reduce and data compression on a single-node Hadoop setup, illustrating data reduction techniques for large-scale data management.

Findings

01

Successful execution of word count Map-Reduce on a single node

02

Effective data compression using LZO algorithm

03

Reduced storage requirements for large datasets

Abstract

Applications like Yahoo, Facebook, Twitter have huge data which has to be stored and retrieved as per client access. This huge data storage requires huge database leading to increase in physical storage and becomes complex for analysis required in business growth. This storage capacity can be reduced and distributed processing of huge data can be done using Apache Hadoop which uses Map-reduce algorithm and combines the repeating data so that entire data is stored in reduced format. The paper describes performing a wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Data Stream Mining Techniques · Advanced Database Systems and Queries