
TL;DR
This paper introduces two Stata modules that leverage Python's Scikit-learn to perform hyper-parameter tuning and cross-validation for machine learning models within Stata, streamlining ML workflows for regression and classification.
Contribution
The paper presents new Stata modules that integrate with Python's Scikit-learn, enabling efficient hyper-parameter tuning and cross-validation in Stata using the sfi platform.
Findings
Effective integration of Stata and Python for ML tasks.
Automated hyper-parameter tuning via cross-validation.
Supports both regression and classification models.
Abstract
We present two related Stata modules, r_ml_stata and c_ml_stata, for fitting popular Machine Learning (ML) methods both in regression and classification settings. Using the recent Stata/Python integration platform (sfi) of Stata 16, these commands provide hyper-parameters' optimal tuning via K-fold cross-validation using greed search. More specifically, they make use of the Python Scikit-learn API to carry out both cross-validation and outcome/label prediction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
