Kernel Contracts: A Specification Language for ML Kernel Correctness Across Heterogeneous Silicon
Cooper Veit

TL;DR
This paper introduces a formal specification language for defining and diagnosing kernel correctness contracts across heterogeneous silicon platforms, addressing inconsistencies in ML kernel behaviors.
Contribution
It presents a comprehensive contract language with eight components, grounded in empirical evidence, enabling precise diagnosis of kernel violations across different hardware.
Findings
Applied framework to three documented incidents, mapping informal diagnoses to specific contract violations.
Defined twelve contract classes covering various failure modes with measurable signatures.
Established a calibration process requiring implementations that conform and violate contracts.
Abstract
Every ML kernel ships with an implicit contract about what it computes. People rarely write the contract down. When two kernels disagree -- when a matmul on AMD produces a different gradient than the same matmul on NVIDIA, when a fused attention kernel silently downcasts an accumulator, when an out-of-bounds access returns zero on one stack and garbage on another -- there is no formal artifact to arbitrate the dispute. Recent empirical work has measured the gap across silicon platforms, but none of it specifies the contract being violated. We present a specification language for kernel contracts. A contract has eight parts: identifier, scope, precondition, postcondition, tolerance, reference oracle, measurement protocol, and violation signature. We use it to state twelve contract classes covering precision, ordering, compiler-induced, and exceptional-value failure modes, each grounded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
