Integrazione di Apache Hive con Spark

Michele Gentile; Massimiliano Morrelli

arXiv:1901.06238·cs.DB·January 21, 2019·1 cites

Integrazione di Apache Hive con Spark

Michele Gentile, Massimiliano Morrelli

PDF

Open Access

TL;DR

This paper discusses integrating Apache Hive with Spark using the Hive Warehouse Connector to enable efficient data transfer and analysis between SQL and NoSQL systems leveraging Spark's distributed computing capabilities.

Contribution

It introduces the use of Hive Warehouse Connector APIs to facilitate interoperability and data operations between Hive and Spark, enhancing data accessibility outside the Ambari cluster.

Findings

01

Successful implementation of Hive-Spark integration

02

Improved data transfer efficiency between systems

03

Enhanced data accessibility for analysis

Abstract

English. This document describes the solutions adopted, which arose from the need to transfer a large amount of information between the most famous distributed SQL and NoSQL storage systems to perform analysis and/or modification operations exploiting the peculiarities of the same. The goal was achieved using the Spark engine and studying and using the open source library "Hive Warehouse Connector" made by Hortonworks. It provides new interoperability features between Hive and Spark. The choice fell on these APIs in order to take advantage from Spark's distributed computing through Spark-Sql libraries, to allow a quick reading and writing on the databases chosen by the Network Contacts Systems Engineering Team and to make the stored information available for consultation outside the Ambari cluster. Italiano. Il presente documento descrive le soluzioni adottate, nate dalla necessit\`a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Advanced Database Systems and Queries