# Machine-Learning-Based Prediction of Plant Cuticle–Air Partition Coefficients for Organic Pollutants: Revealing Mechanisms from a Molecular Structure Perspective

**Authors:** Tianyun Tao, Cuicui Tao, Tengyi Zhu

PMC · DOI: 10.3390/molecules29061381 · Molecules · 2024-03-20

## TL;DR

This paper uses machine learning to predict how organic pollutants partition between plant cuticles and air, helping assess their environmental impact.

## Contribution

The study introduces high-performing machine-learning models and explains molecular mechanisms affecting pollutant adsorption by plant cuticles.

## Key findings

- The GBDT-2 model showed the best performance with high predictive accuracy and robustness.
- Molecular properties like size, polarizability, and complexity influence pollutant adsorption by plant cuticles.
- The models can guide the environmental fate of pollutants and support sustainable chemical engineering.

## Abstract

Accurately predicting plant cuticle–air partition coefficients (Kca) is essential for assessing the ecological risk of organic pollutants and elucidating their partitioning mechanisms. The current work collected 255 measured Kca values from 25 plant species and 106 compounds (dataset (I)) and averaged them to establish a dataset (dataset (II)) containing Kca values for 106 compounds. Machine-learning algorithms (multiple linear regression (MLR), multi-layer perceptron (MLP), k-nearest neighbors (KNN), and gradient-boosting decision tree (GBDT)) were applied to develop eight QSPR models for predicting Kca. The results showed that the developed models had a high goodness of fit, as well as good robustness and predictive performance. The GBDT-2 model (Radj2 = 0.925, QLOO2 = 0.756, QBOOT2 = 0.864, Rext2 = 0.837, Qext2 = 0.811, and CCC = 0.891) is recommended as the best model for predicting Kca due to its superior performance. Moreover, interpreting the GBDT-1 and GBDT-2 models based on the Shapley additive explanations (SHAP) method elucidated how molecular properties, such as molecular size, polarizability, and molecular complexity, affected the capacity of plant cuticles to adsorb organic pollutants in the air. The satisfactory performance of the developed models suggests that they have the potential for extensive applications in guiding the environmental fate of organic pollutants and promoting the progress of eco-friendly and sustainable chemical engineering.

## Full-text entities

- **Genes:** ABCC3 (ATP binding cassette subfamily C member 3) [NCBI Gene 8714] {aka ABC31, EST90757, MLP2, MOAT-D, MRP3, cMOAT2}, LCOR (ligand dependent nuclear receptor corepressor) [NCBI Gene 84458] {aka C10orf12, MLR2}, LCORL (ligand dependent nuclear receptor corepressor like) [NCBI Gene 254251] {aka MLR1}, CSN3 (casein kappa) [NCBI Gene 1448] {aka CNS10, CSN10, CSNK, KCA}, MARCKSL1 (MARCKS like 1) [NCBI Gene 65108] {aka F52, MACMARCKS, MLP, MLP1, MRP}
- **Diseases:** toxicity (MESH:D064420), injury to people or property (MESH:C000719191), oral (MESH:D020820)
- **Chemicals:** water (MESH:D014867), Hexachlorobenzene (MESH:D006581), Decachlorobiphenyl (MESH:C005381), Tetrachlorobiphenyl (-), phospholipid (MESH:D010743), hydrogen (MESH:D006859)
- **Species:** Homo sapiens (human, species) [taxon 9606], Rattus norvegicus (brown rat, species) [taxon 10116]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10975432/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10975432/full.md

## References

85 references — full list in the complete paper: https://tomesphere.com/paper/PMC10975432/full.md

---
Source: https://tomesphere.com/paper/PMC10975432