VeRA: Verified Reasoning Data Augmentation at Scale
Zerui Cheng, Jiashuo Liu, Chunjie Wu, Jianzhu Yao, Pramod Viswanath, Ge Zhang, Wenhao Huang

TL;DR
VeRA introduces a scalable framework for generating verified, diverse problem variants from a single seed, improving AI evaluation robustness and enabling continuous, cost-effective benchmarking without human labeling.
Contribution
VeRA provides a novel method to automatically generate verified problem variants from a single seed, transforming static benchmarks into scalable, executable specifications for robust AI evaluation.
Findings
VeRA-E improves evaluation quality and detects contamination.
VeRA-H enables automatic creation of hard, verifiable tasks.
VeRA establishes a new paradigm for scalable, verified benchmarks.
Abstract
The main issue with most evaluation schemes today is their "static" nature: the same problems are reused repeatedly, allowing for memorization, format exploitation, and eventual saturation. To measure genuine AI progress, we need evaluation that is robust by construction, not by post-hoc detection. In response, we propose VeRA (Verified Reasoning Data Augmentation), a framework that converts benchmark problems into executable specifications, comprising (i) a natural language template with placeholder slots, (ii) a coherent generator that samples valid configurations, and (iii) a deterministic verifier that validates parameters and calculates the corresponding correct answers for each configuration. From a single seed problem, VeRA automatically creates unlimited verified variants with reliable labels at near-zero marginal cost without human involvement. VeRA operates in two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Topic Modeling
