TreeGen -- a Monte Carlo generator for data frames
Agnieszka Niemczynowicz, Gabriela Bia{\l}osk\'orska, Joanna, Nie\.zurawska-Zaj\k{a}c, Rados{\l}aw A. Kycia

TL;DR
TreeGen introduces a probabilistic tree data structure and Monte Carlo generator for data frames, enabling data augmentation, compression, and hierarchical modeling while preserving statistical properties.
Contribution
The paper presents the probability tree data structure and a Monte Carlo generator, extending decision trees to handle multiple choices with probabilities for data science applications.
Findings
Supports data multiplicity increase
Enables data compression with statistical preservation
Facilitates hierarchical data modeling
Abstract
The typical problem in Data Science is creating a structure that encodes the occurrence frequency of unique elements in rows and relations between different rows of a data frame. We present the probability tree abstract data structure, an extension of the decision tree, that facilitates more than two choices with assigned probabilities. Such a tree represents statistical relations between different rows of the data frame. The Probability Tree algorithmic structure is supplied with the Generator module that is a Monte Carlo generator that traverses through the tree. These two components are implemented in TreeGen Python package. The package can be used in increasing data multiplicity, compressing data preserving its statistical information, constructing hierarchical models, exploring data, and in feature extraction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research Methodologies and Applications · Computational Physics and Python Applications · Plant Water Relations and Carbon Dynamics
