ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning

Chengcao Yang

arXiv:2604.27644·cs.LG·May 8, 2026

ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning

Chengcao Yang

PDF

TL;DR

ANCORA introduces a self-play paradigm where a policy learns to generate and verify problems and solutions, enabling verifiable reasoning without human-annotated data, significantly improving performance in program verification tasks.

Contribution

The paper presents ANCORA, a novel self-play framework with stabilizing mechanisms that bootstrap a verifiable curriculum from scratch, outperforming existing methods in program verification benchmarks.

Findings

01

ANCORA achieves 81.5% pass@1 in Verus TTT, surpassing previous self-play methods.

02

Training from scratch yields competitive transfer performance on MBPP and HumanEval.

03

Stabilizers prevent Proposer collapse and enable effective curriculum self-play.

Abstract

We propose a paradigm shift toward open-ended curriculum self-play: rather than learning to answer on a fixed prompt set, a unified policy learns to question: generating verifiable problems, solving them, and turning verifier feedback into self-improvement without human-annotated solutions. We introduce ANCORA, in which the policy alternates between a Proposer that synthesizes novel specifications and a Solver that produces verified solutions, anchored by three load-bearing mechanisms: a two-level group-relative update coupling Proposer advantages across specifications with Solver advantages across solution attempts; iterative self-distilled SFT projecting the base model onto its valid-output manifold before RL; and a UCB-guided Curriculum DAG whose policy-induced problem set can provably expand under self-composition. Without these stabilizers, sparse verifier feedback drives Proposer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.