Randomness control and reproducibility study of random forest algorithm in R and Python
Louisa Camadini, Yanis Bouzid, Maeva Merlet, L\'eopold Carron

TL;DR
This study examines the reproducibility of random forest algorithms across R and Python implementations, focusing on controlling randomness to ensure consistent results in toxicological assessments.
Contribution
It compares four random forest packages and identifies key parameters and randomness sources affecting reproducibility across implementations.
Findings
Reproducibility depends on consistent PRNG and parameter settings.
Differences in implementations can lead to variability in results.
Guidelines for ensuring reproducibility across R and Python are proposed.
Abstract
When it comes to the safety of cosmetic products, compliance with regulatory standards is crucialto guarantee consumer protection against the risks of skin irritation. Toxicologists must thereforebe fully conversant with all risks. This applies not only to their day-to-day work, but also to allthe algorithms they integrate into their routines. Recognizing this, ensuring the reproducibility ofalgorithms becomes one of the most crucial aspects to address.However, how can we prove the robustness of an algorithm such as the random forest, that reliesheavily on randomness? In this report, we will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.Our goal is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications
