$\texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery
Konstantin G\"obler, Tobias Windisch, Mathias Drton, Tim Pychynski,, Steffen Sonntag, Martin Roth

TL;DR
This paper introduces causalAssembly, a Python library that generates realistic semi-synthetic manufacturing data with known causal relationships, enabling benchmarking of causal discovery algorithms in complex, real-world scenarios.
Contribution
The paper presents a novel system for creating semi-synthetic data with ground truth causal relations, facilitating empirical validation of causal discovery methods.
Findings
Effective benchmarking of causal discovery algorithms demonstrated
Distributional random forests enable flexible conditional distribution estimation
Ground truth causal relationships improve validation accuracy
Abstract
Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To help address these challenges, we gather a complex dataset comprising measurements from an assembly line in a manufacturing context. This line consists of numerous physical processes for which we are able to provide ground truth causal relationships on the basis of a detailed study of the underlying physics. We use the assembly line data and associated ground truth information to build a system for generation of semisynthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms · Machine Learning and Data Classification
MethodsLib
