Two-level Data Staging ETL for Transaction Data
Xiufeng Liu

TL;DR
This paper introduces a two-level data staging ETL approach that efficiently manages high-frequency transaction data changes, including insertions, updates, deletions, and early-arriving data, to improve data warehousing processes.
Contribution
It presents a novel two-level staging method that detects data changes, assigns operation codes, and optimizes processing of various data change types in ETL workflows.
Findings
Efficient handling of high-frequency transaction data changes.
Improved processing of early-arriving data.
Enhanced change detection and operation coding.
Abstract
In data warehousing, Extract-Transform-Load (ETL) extracts the data from data sources into a central data warehouse regularly for the support of business decision-makings. The data from transaction processing systems are featured with the high frequent changes of insertion, update, and deletion. It is challenging for ETL to propagate the changes to the data warehouse, and maintain the change history. Moreover, ETL jobs typically run in a sequential order when processing the data with dependencies, which is not optimal, \eg, when processing early-arriving data. In this paper, we propose a two-level data staging ETL for handling transaction data. The proposed method detects the changes of the data from transactional processing systems, identifies the corresponding operation codes for the changes, and uses two staging databases to facilitate the data processing in an ETL process. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Cloud Computing and Resource Management
