SAT-sampling for statistical significance testing in sparse contingency tables
Patrick Scharpfenecker, Tobias Windisch

TL;DR
This paper introduces a SAT-based sampling method for exact significance testing in sparse contingency tables, offering a practical alternative to traditional Markov basis MCMC with improved performance in challenging cases.
Contribution
The authors develop a SAT-based sampling approach for contingency tables that handles structural zeros and sparse data more efficiently than classical methods.
Findings
SAT-based samplers produce reliable p-values in sparse tables
Hybrid schemes improve sampling accuracy and efficiency
Outperforms traditional Markov basis methods in benchmarks
Abstract
Exact conditional tests for contingency tables require sampling from fibers with fixed margins. Classical Markov basis MCMC is general but often impractical: computing full Markov bases that connect all fibers of a given constraint matrix can be infeasible and the resulting chains may converge slowly, especially in sparse settings or in presence of structural zeros. We introduce a SAT-based alternative that encodes fibers as Boolean circuits which allows modern SAT samplers to generate tables randomly. We analyze the sampling bias that SAT samplers may introduce, provide diagnostics, and propose practical mitigation. We propose hybrid MCMC schemes that combine SAT proposals with local moves to ensure correct stationary distributions which do not necessarily require connectivity via local moves which is particularly beneficial in presence of structural zeros. Across benchmarks, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Markov Chains and Monte Carlo Methods · Data Quality and Management
