DataOps-driven CI/CD for analytics repositories
Dmytro Valiaiev

TL;DR
This paper proposes a DataOps-aligned CI/CD framework with a validation scorecard and controls to improve data quality, governance, and collaboration in analytics repositories, addressing the lack of standardization.
Contribution
It introduces a novel DataOps Controls Scorecard and a modular CI/CD pipeline framework for automated validation of SQL analytics repositories.
Findings
The framework enforces key data quality controls.
The scorecard distills best practices into 12 testable controls.
Automated checks improve governance and collaboration.
Abstract
The proliferation of SQL for data processing has often occurred without the rigor of traditional software development, leading to siloed efforts, logic replication, and increased risk. This ad-hoc approach hampers data governance and makes validation nearly impossible. Organizations are adopting DataOps, a methodology combining Agile, Lean, and DevOps principles to address these challenges to treat analytics pipelines as production systems. However, a standardized framework for implementing DataOps is lacking. This perspective proposes a qualitative design for a DataOps-aligned validation framework. It introduces a DataOps Controls Scorecard, derived from a multivocal literature review, which distills key concepts into twelve testable controls. These controls are then mapped to a modular, extensible CI/CD pipeline framework designed to govern a single source of truth (SOT) SQL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Scientific Computing and Data Management · Advanced Database Systems and Queries
