Efficient Routing for Cost Effective Scale-out Data Architectures
Ashwin Narayan, Vuk Markovic, Natalia Postawa, Anna King, Alejandro, Morales, K. Ashwin Kumar, Petros Efstathopoulos

TL;DR
This paper introduces an efficient routing technique for large-scale data architectures that reduces query processing time and minimizes machine usage by leveraging query correlation and clustering, outperforming existing methods.
Contribution
It proposes an incremental set cover-based routing method that speeds up multi-query routing and reduces machine contact, improving over traditional repeated greedy algorithms.
Findings
2.5 times faster routing speed
50% fewer machines contacted per query
Effective query clustering reduces set cover computation time
Abstract
Efficient retrieval of information is of key importance when using Big Data systems. In large scale-out data architectures, data are distributed and replicated across several machines. Queries/tasks to such data architectures, are sent to a router which determines the machines containing the requested data. Ideally, to reduce the overall cost of analytics, the smallest set of machines required to satisfy the query should be returned by the router. Mathematically, this can be modeled as the set cover problem, which is NP-hard, thus making the routing process a balance between optimality and performance. Even though an efficient greedy approximation algorithm for routing a single query exists, there is currently no better method for processing multiple queries than running the greedy set cover algorithm repeatedly for each query. This method is impractical for Big Data systems and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
