Online Query Scheduling on Source Permutation for Big Data Integration

Zimu Yuan; Shusheng Guo

arXiv:1503.08400·cs.DB·March 31, 2015

Online Query Scheduling on Source Permutation for Big Data Integration

Zimu Yuan, Shusheng Guo

PDF

Open Access

TL;DR

This paper presents an online query scheduling strategy for big data integration that optimizes source permutation and statistics estimation to minimize retrieval time in real-time scenarios.

Contribution

It introduces a novel online scheduling approach that improves statistics, constructs source permutation, and executes queries in parallel for efficient big data integration.

Findings

01

High efficiency demonstrated in experiments

02

Scalability of the scheduling strategy confirmed

03

Effective minimization of query response time

Abstract

Big data integration could involve a large number of sources with unpredictable redundancy information between them. The approach of building a central warehousing to integrate big data from all sources then becomes infeasible because of so large number of sources and continuous updates happening. A practical approach is to apply online query scheduling that inquires data from sources at runtime upon receiving a query. In this paper, we address the Time-Cost Minimization Problem for online query scheduling, and tackle the challenges of source permutation and statistics estimation to minimize the time cost of retrieving answers for the real-time receiving query. We propose the online scheduling strategy that enables the improvement of statistics, the construction of source permutation and the execution of query working in parallel. Experimental results show high efficiency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Quality and Management · Advanced Database Systems and Queries