PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data

Kai Xiong; Yanwei Huang; Rongjunchen Zhang; Kun Chen; Haipang Wu

arXiv:2508.15180·cs.AI·August 26, 2025

PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data

Kai Xiong, Yanwei Huang, Rongjunchen Zhang, Kun Chen, Haipang Wu

PDF

1 Datasets

TL;DR

PuzzleClone introduces a scalable SMT-based framework for generating large, diverse, and verifiable logical puzzles to enhance the reasoning abilities of large language models, significantly improving their performance on related benchmarks.

Contribution

It presents a novel formal framework for synthesizing verifiable puzzles at scale using SMT, including encoding, randomization, and validation techniques, and demonstrates its effectiveness in improving LLM reasoning.

Findings

01

Generated over 83K validated puzzles with diverse formats and difficulty levels.

02

Training on PuzzleClone data significantly improves LLM performance on logic benchmarks.

03

Post training increases PuzzleClone scores from 14.4 to 56.2 and boosts other benchmarks by up to 12.5 points.

Abstract

High-quality mathematical and logical datasets with verifiable answers are essential for strengthening the reasoning capabilities of large language models (LLMs). While recent data augmentation techniques have facilitated the creation of large-scale benchmarks, existing LLM-generated datasets often suffer from limited reliability, diversity, and scalability. To address these challenges, we introduce PuzzleClone, a formal framework for synthesizing verifiable data at scale using Satisfiability Modulo Theories (SMT). Our approach features three key innovations: (1) encoding seed puzzles into structured logical specifications, (2) generating scalable variants through systematic variable and constraint randomization, and (3) ensuring validity via a reproduction mechanism. Applying PuzzleClone, we construct a curated benchmark comprising over 83K diverse and programmatically validated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

hithink-ai/PuzzleClone
dataset· 52 dl
52 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.