DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation
Ghadi S. Al Hajj, Johan Pensar, Geir Kjetil Sandve

TL;DR
DagSim is a flexible Python framework that enables DAG-based data simulation with unconstrained variable types and relations, promoting transparency and modularity for complex data scenarios.
Contribution
It introduces a novel DAG-based simulation framework that supports arbitrary variable types and functional relations, overcoming limitations of existing methods.
Findings
Supports complex variable types and relations
Uses a transparent YAML model specification
Demonstrates applications in image and bio-sequence data
Abstract
Data simulation is fundamental for machine learning and causal inference, as it allows exploration of scenarios and assessment of methods in settings with full control of ground truth. Directed acyclic graphs (DAGs) are well established for encoding the dependence structure over a collection of variables in both inference and simulation settings. However, while modern machine learning is applied to data of an increasingly complex nature, DAG-based simulation frameworks are still confined to settings with relatively simple variable types and functional forms. We here present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations. A succinct YAML format for defining the simulation model structure promotes transparency, while separate user-provided functions for generating each variable based on its parents ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Explainable Artificial Intelligence (XAI) · Computational Physics and Python Applications
