Efficient Routing for Cost Effective Scale-out Data Architectures

Ashwin Narayan; Vuk Markovic; Natalia Postawa; Anna King; Alejandro; Morales; K. Ashwin Kumar; Petros Efstathopoulos

arXiv:1606.08884·cs.DB·June 30, 2016

Efficient Routing for Cost Effective Scale-out Data Architectures

Ashwin Narayan, Vuk Markovic, Natalia Postawa, Anna King, Alejandro, Morales, K. Ashwin Kumar, Petros Efstathopoulos

PDF

TL;DR

This paper introduces an efficient routing technique for large-scale data architectures that reduces query processing time and minimizes machine usage by leveraging query correlation and clustering, outperforming existing methods.

Contribution

It proposes an incremental set cover-based routing method that speeds up multi-query routing and reduces machine contact, improving over traditional repeated greedy algorithms.

Findings

01

2.5 times faster routing speed

02

50% fewer machines contacted per query

03

Effective query clustering reduces set cover computation time

Abstract

Efficient retrieval of information is of key importance when using Big Data systems. In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which determines the machines containing the requested data. Ideally, to reduce the overall cost of analytics, the smallest set of machines required to satisfy the query should be returned by the router. Mathematically, this can be modeled as the set cover problem, which is NP-hard, thus making the routing process a balance between optimality and performance. Even though an efficient greedy approximation algorithm for routing a single query exists, there is currently no better method for processing multiple queries than running the greedy set cover algorithm repeatedly for each query. This method is impractical for Big Data systems and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.