TL;DR
This paper introduces SSLogic, an agentic meta-synthesis framework that evolves task-family specifications for logic reasoning, leading to improved training data utility and downstream performance in reinforcement learning.
Contribution
It shifts the focus from instance-level perturbations to evolving task-family specifications using an iterative Generate-Validate-Refine loop with multi-strategy consensus.
Findings
Evolved data improves training utility on Enigmata benchmarks.
Framework generates 953 families and 21,389 verifiable instances from 400 seed families.
Structural evolution enhances logic and operation capabilities, boosting downstream task performance.
Abstract
Reinforcement Learning from Verifiable Rewards (RLVR) is bottlenecked by data: existing synthesis pipelines rely on expert-written code or fixed templates, confining growth to instance-level perturbations. We shift the evolvable unit from problem instances to task-family specifications. SSLogic is an agentic meta-synthesis framework in which LLM agents iteratively author and refine executable Generator-Validator pairs inside a closed Generate-Validate-Refine loop, producing families with new rules and difficulty gradients rather than parameter variations of old ones. A Multi-Gate Validation Protocol -- multi-strategy consensus plus Adversarial Blind Review, where independent agents solve each instance by writing and executing code -- filters ill-posed tasks before they enter training. Starting from 400 seed families, two evolution rounds yield 953 families and 21,389 verifiable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
