AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs
Hangcheng An, Rui Wang, Depei Qian

TL;DR
AccelSync is a static verification tool that formalizes and checks synchronization correctness in AI accelerator pipeline programs, effectively detecting hazards with high accuracy and low cost.
Contribution
It formalizes accelerator pipeline semantics, proves barrier sufficiency decidability, and implements a sound, complete verification tool for real-world hardware.
Findings
Identifies previously unknown synchronization hazards in production kernels.
Detects hazards in 19.2% of LLM-generated kernels.
Achieves 100% detection on mutants, outperforming msSanitizer.
Abstract
AI accelerator operators are compiled into multi-stage pipeline programs where DMA, vector, matrix, and scalar units execute concurrently on shared on-chip buffers. A missing or misplaced synchronization primitive introduces hardware-visible data races that escape both simulation and golden testing, because neither models the accelerator's cross-unit visibility semantics. We formalize accelerator pipeline programs as a restricted concurrent language, define a parameterized hardware event semantics with three ordering relations -- program order, synchronization order, and barrier order -- and reduce the correctness question to barrier sufficiency: whether every cross-unit write-read pair on the same buffer is ordered by happens-before. Here "barrier" denotes an abstract ordering primitive in the model, covering vendor pipe barriers, hard-event synchronization, and equivalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
