Fidelity Probes for Specification--Code Alignment
Ferhat Erata, Hao Zhou, Luke Huan

TL;DR
This paper introduces fidelity probes, a method using natural-language questions derived from code artifacts to measure and improve specification-code alignment, demonstrated on a COBOL benchmark.
Contribution
It presents a novel approach combining language models and static analysis to generate fidelity probes, enabling targeted specification refinement and convergence prediction.
Findings
Fidelity improved from 0.63 to 0.94 over eight iterations.
A Markov fixed point predicts convergence location from limited data.
Cross-model evaluation confirms convergence behavior is model-agnostic.
Abstract
We introduce fidelity probes: natural-language questions generated from a reference artifact with code-derived ground-truth answers, answered from a candidate specification. The fraction of agreeing probes, which we call the fidelity, decomposes into contradiction and coverage-gap rates that drive targeted spec edits to convergence. On a 15-program, roughly 12k-line COBOL benchmark (AWS CardDemo), we raise frozen-test specification fidelity from 0.63 to 0.94 over eight iterations, with the plateau location predicted by a two-state Markov fixed point from just four iterations of rate data. Probes come from an LLM reading the code or from a static-analysis pipeline over its control-flow, data-flow, and system-dependence graphs, with a tunable mixture. A probe-resampling protocol with a frozen held-out set gives a Hoeffding-bounded overfitting discriminant; our measured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
