Arbitrarily Large Labelled Random Satisfiability Formulas for Machine Learning Training
Dimitris Achlioptas, Amrit Daswaney, Periklis A. Papakonstantinou

TL;DR
This paper introduces a method to generate arbitrarily large, correctly labeled random SAT formulas for training machine learning models, enabling better generalization to real-world problem sizes.
Contribution
It presents a probabilistic method to generate large labeled SAT formulas without solving them, facilitating scalable training for machine learning models.
Findings
State-of-the-art models perform no better than random on large formulas.
A new classifier achieves 99% accuracy on large datasets.
Learning based on solver computation prefixes offers a novel approach.
Abstract
Applying deep learning to solve real-life instances of hard combinatorial problems has tremendous potential. Research in this direction has focused on the Boolean satisfiability (SAT) problem, both because of its theoretical centrality and practical importance. A major roadblock faced, though, is that training sets are restricted to random formulas of size several orders of magnitude smaller than formulas of practical interest, raising serious concerns about generalization. This is because labeling random formulas of increasing size rapidly becomes intractable. By exploiting the probabilistic method in a fundamental way, we remove this roadblock entirely: we show how to generate correctly labeled random formulas of any desired size, without having to solve the underlying decision problem. Moreover, the difficulty of the classification task for the formulas produced by our generator is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Bayesian Modeling and Causal Inference · Machine Learning and Data Classification
