Fidelity Probes for Specification--Code Alignment

Ferhat Erata; Hao Zhou; Luke Huan

arXiv:2605.17246·cs.LG·May 19, 2026

Fidelity Probes for Specification--Code Alignment

Ferhat Erata, Hao Zhou, Luke Huan

PDF

TL;DR

This paper introduces fidelity probes, a method using natural-language questions derived from code artifacts to measure and improve specification-code alignment, demonstrated on a COBOL benchmark.

Contribution

It presents a novel approach combining language models and static analysis to generate fidelity probes, enabling targeted specification refinement and convergence prediction.

Findings

01

Fidelity improved from 0.63 to 0.94 over eight iterations.

02

A Markov fixed point predicts convergence location from limited data.

03

Cross-model evaluation confirms convergence behavior is model-agnostic.

Abstract

We introduce fidelity probes: natural-language questions generated from a reference artifact with code-derived ground-truth answers, answered from a candidate specification. The fraction of agreeing probes, which we call the fidelity, decomposes into contradiction and coverage-gap rates that drive targeted spec edits to convergence. On a 15-program, roughly 12k-line COBOL benchmark (AWS CardDemo), we raise frozen-test specification fidelity from 0.63 to 0.94 over eight iterations, with the plateau location predicted by a two-state Markov fixed point $F^{†}$ from just four iterations of rate data. Probes come from an LLM reading the code or from a static-analysis pipeline over its control-flow, data-flow, and system-dependence graphs, with a tunable mixture. A probe-resampling protocol with a frozen held-out set gives a Hoeffding-bounded overfitting discriminant; our measured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.