Monte Carlo simulation studies on Python using the sstudy package with SQL databases as storage
Marco H A In\'acio

TL;DR
This paper introduces 'sstudy', a Python package that simplifies simulation studies for machine learning performance assessment by integrating SQL databases for efficient data storage and management.
Contribution
The paper presents a new Python package 'sstudy' that streamlines simulation studies using SQL databases, with features, usage examples, and application demonstrations.
Findings
sstudy simplifies simulation study setup and data management.
The package effectively integrates with SQL databases for storage.
Applications demonstrate improved efficiency in performance assessment.
Abstract
Performance assessment is a key issue in the process of proposing new machine learning/statistical estimators. A possible method to complete such task is by using simulation studies, which can be defined as the procedure of estimating and comparing properties (such as predictive power) of estimators (and other statistics) by averaging over many replications given a true distribution; i.e.: generating a dataset, fitting the estimator, calculating and storing the predictive power, and then repeating the procedure many times and finally averaging over the stored predictive powers. Given that, in this paper, we present sstudy: a Python package designed to simplify the preparation of simulation studies using SQL database engines as the storage system; more specifically, we present its basic features, usage examples and references to the its documentation. We also present a short statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Computational Physics and Python Applications · Data Analysis with R
