Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

Qingyun Zou; Yingze Li; Tianen Liu; Bingsheng He; and Weng-Fai Wong

arXiv:2605.13501·cs.AR·May 14, 2026

Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

Qingyun Zou, Yingze Li, Tianen Liu, Bingsheng He, and Weng-Fai Wong

PDF

TL;DR

This paper introduces RWOPD, a novel on-policy distillation method that improves NL-to-SVA generation accuracy by optimizing property equivalence using a verifier-guided reward system.

Contribution

The paper proposes RWOPD, a property-aware distillation technique that leverages a property-verification reward to enhance SVA correctness in large language models.

Findings

01

RWOPD achieves state-of-the-art results on NL2SVA benchmarks.

02

Distilling with property-based rewards surpasses traditional token-level mimicry methods.

03

The approach improves pass@1, pass@5, and pass@10 metrics over prior models.

Abstract

LLM-based generation of SystemVerilog Assertions (SVA) is often reported as nearing saturation, with the strongest specialized model reaching $\sim 76%$ accuracy on NL2SVA-Human. We show that this aggregate hides a temporal gap: models that appear strong overall still collapse to a few implication templates on bounded-delay and liveness specifications. The core issue is that the dominant recipe, supervised fine-tuning on NL/SVA pairs, optimizes token-level mimicry rather than the \emph{property equivalence} that defines SVA correctness. We introduce \emph{Reward-Weighted On-Policy Distillation} (RWOPD), an on-policy distillation method that samples student rollouts, scores them with an open SymbiYosys+Z3 Property-Equivalence Checker (PEC), and applies a verifier-reward-weighted forward-KL gradient from a frozen 14B teacher on verifier-passable rollouts. This keeps the supervision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.