Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science
Tsuyoshi Okita

TL;DR
The paper introduces CFM-SD, a method that uses physical simulators as do-operators for causal discovery in scientific domains with latent confounders, enabling effective causal inference with real interventions.
Contribution
It proposes a novel approach integrating physical simulators into causal discovery, handling latent confounders and real interventions, with theoretical identifiability and practical validation.
Findings
CFM-SD achieves high F1 score of 0.800 on synthetic data.
It reduces bias by 57-58% in molecular toxicity and battery optimization.
The method is practically effective beyond synthetic benchmarks.
Abstract
Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as do-operators in Pearl's interventional calculus to simultaneously handle latent confounders and real interventional data. Theoretically, -variable causal structure is identifiable with single-variable interventions -- the minimum under physical realizability constraints. In Intrinsic Evaluation on synthetic data (--), CFM-SD achieves average F1 vs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
