Random forest models of the retention constants in the thin layer chromatography
Miron B. Kursa, {\L}ukasz Komsta, Witold R. Rudnicki

TL;DR
This study applies random forest machine learning models with feature selection to predict retention constants in thin layer chromatography, outperforming linear regression and demonstrating robustness through cross-validation.
Contribution
It introduces a novel application of random forest models with feature selection for TLC retention prediction, improving accuracy over traditional methods.
Findings
Random forest models outperform linear regression in predicting retention constants.
Feature selection reduces the number of relevant descriptors significantly.
Models demonstrate robustness validated by cross-validation.
Abstract
In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalytical Chemistry and Chromatography · Spectroscopy and Chemometric Analyses · Advanced Chemical Sensor Technologies
