Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments
Kangning Gao, Yi Hu, Cong Nie, Wei Li

TL;DR
This paper introduces a deep Q-learning framework for optimizing ETL scheduling in heterogeneous data environments, significantly improving efficiency, resource utilization, and adaptability in complex data processing systems.
Contribution
It presents a novel reinforcement learning-based scheduling model that dynamically adapts to complex, high-dimensional data environments for ETL processes.
Findings
Reduces scheduling delay significantly
Improves system throughput and stability
Demonstrates robustness under various conditions
Abstract
This paper addresses the challenges of low scheduling efficiency, unbalanced resource allocation, and poor adaptability in ETL (Extract-Transform-Load) processes under heterogeneous data environments by proposing an intelligent scheduling optimization framework based on deep Q-learning. The framework formalizes the ETL scheduling process as a Markov Decision Process and enables adaptive decision-making by a reinforcement learning agent in high-dimensional state spaces to dynamically optimize task allocation and resource scheduling. The model consists of a state representation module, a feature embedding network, a Q-value estimator, and a reward evaluation mechanism, which collectively consider task dependencies, node load states, and data flow characteristics to derive the optimal scheduling strategy in complex environments. A multi-objective reward function is designed to balance key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Big Data and Digital Economy
