Machine Learning using Stata/Python

Giovanni Cerulli

arXiv:2103.03122·stat.CO·March 5, 2021

Machine Learning using Stata/Python

Giovanni Cerulli

PDF

TL;DR

This paper introduces two Stata modules that leverage Python's Scikit-learn to perform hyper-parameter tuning and cross-validation for machine learning models within Stata, streamlining ML workflows for regression and classification.

Contribution

The paper presents new Stata modules that integrate with Python's Scikit-learn, enabling efficient hyper-parameter tuning and cross-validation in Stata using the sfi platform.

Findings

01

Effective integration of Stata and Python for ML tasks.

02

Automated hyper-parameter tuning via cross-validation.

03

Supports both regression and classification models.

Abstract

We present two related Stata modules, r_ml_stata and c_ml_stata, for fitting popular Machine Learning (ML) methods both in regression and classification settings. Using the recent Stata/Python integration platform (sfi) of Stata 16, these commands provide hyper-parameters' optimal tuning via K-fold cross-validation using greed search. More specifically, they make use of the Python Scikit-learn API to carry out both cross-validation and outcome/label prediction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.