Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science

Tsuyoshi Okita

arXiv:2605.07467·cs.LG·May 11, 2026

Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science

Tsuyoshi Okita

PDF

TL;DR

The paper introduces CFM-SD, a method that uses physical simulators as do-operators for causal discovery in scientific domains with latent confounders, enabling effective causal inference with real interventions.

Contribution

It proposes a novel approach integrating physical simulators into causal discovery, handling latent confounders and real interventions, with theoretical identifiability and practical validation.

Findings

01

CFM-SD achieves high F1 score of 0.800 on synthetic data.

02

It reduces bias by 57-58% in molecular toxicity and battery optimization.

03

The method is practically effective beyond synthetic benchmarks.

Abstract

Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as do-operators in Pearl's interventional calculus to simultaneously handle latent confounders and real interventional data. Theoretically, $d$ -variable causal structure is identifiable with $O (d)$ single-variable interventions -- the minimum under physical realizability constraints. In Intrinsic Evaluation on synthetic data ( $γ = 0.2$ -- $0.8$ ), CFM-SD achieves average F1 $= 0.800$ vs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.