Online Query Scheduling on Source Permutation for Big Data Integration
Zimu Yuan, Shusheng Guo

TL;DR
This paper presents an online query scheduling strategy for big data integration that optimizes source permutation and statistics estimation to minimize retrieval time in real-time scenarios.
Contribution
It introduces a novel online scheduling approach that improves statistics, constructs source permutation, and executes queries in parallel for efficient big data integration.
Findings
High efficiency demonstrated in experiments
Scalability of the scheduling strategy confirmed
Effective minimization of query response time
Abstract
Big data integration could involve a large number of sources with unpredictable redundancy information between them. The approach of building a central warehousing to integrate big data from all sources then becomes infeasible because of so large number of sources and continuous updates happening. A practical approach is to apply online query scheduling that inquires data from sources at runtime upon receiving a query. In this paper, we address the Time-Cost Minimization Problem for online query scheduling, and tackle the challenges of source permutation and statistics estimation to minimize the time cost of retrieving answers for the real-time receiving query. We propose the online scheduling strategy that enables the improvement of statistics, the construction of source permutation and the execution of query working in parallel. Experimental results show high efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Quality and Management · Advanced Database Systems and Queries
