Shift schema drift left: policy-aware compile-time contracts for typed JVM and Spark pipelines

Vittal Mirji

arXiv:2604.16986·cs.PL·April 21, 2026

Shift schema drift left: policy-aware compile-time contracts for typed JVM and Spark pipelines

Vittal Mirji

PDF

TL;DR

This paper introduces a Scala 3 framework that enforces schema compatibility policies at compile time and runtime for Spark data pipelines, enhancing reliability against schema drift.

Contribution

It presents a novel compile-time and runtime policy-aware contract system that ensures schema compatibility in Spark pipelines, bridging gaps in existing enforcement methods.

Findings

01

Proves producer-to-contract structural compatibility at compile time.

02

Derives Spark schemas directly from contract types.

03

Re-checks DataFrame schemas at sink boundary before writing.

Abstract

Schema drift in data pipelines is often caught only when a job touches real data. Typed-Dataset layers close part of this gap but require wholesale adoption; table-level enforcement systems close another part but operate at write time against a stored schema. We present a small Scala 3 framework that occupies the seam: it proves producer-to-contract structural compatibility under explicit policies at compile time, derives Spark schemas from the same contract types, and re-checks the actual DataFrame schema at the sink boundary before write. The artifact fuses the compile-time witness with a policy-aware runtime comparator that adds a nested-collection-optionality check Spark's built-in comparators omit and implements structural subset semantics for backward- and forward-compatible field sets. Evaluation covers compile-time proofs, runtime policy tests, builder-path end-to-end tests, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.