# Leveraging molecular descriptors and explainable machine learning for monomer conversion prediction in photoinduced electron transfer-reversible addition-fragmentation chain transfer polymerization

**Authors:** Berna Alemdag, Azra Kocaarslan, Gözde Kabay

PMC · DOI: 10.1038/s41598-025-33553-y · 2026-02-09

## TL;DR

This paper introduces a machine learning model that predicts monomer conversion in a specific type of polymerization process using molecular descriptors and provides interpretable insights into the factors affecting conversion.

## Contribution

The novel approach decomposes polymer systems into individual components and uses molecular descriptors with explainable ML to predict and interpret monomer conversion.

## Key findings

- CatBoost was identified as the top-performing ML algorithm with an R2 of 0.84 for predicting monomer conversion.
- SHAP analysis showed that monomer topological complexity, electronic polarization, and molecular weight explain over 60% of the model's predictive power.
- The model generalized well to unseen (meth)acrylates and (meth)acrylamides with a MAE of 8.03.

## Abstract

This study presents a molecular descriptor-based machine learning (ML) architecture for predicting monomer conversion in photoinduced electron transfer-reversible addition-fragmentation chain transfer (PET-RAFT) polymerization systems. Unlike traditional polymer informatics approaches that treat polymers as single units or use one-hot encoding for reaction components, we decompose each PET-RAFT system into its individual parts: monomer, RAFT agent, and photocatalyst. Next, each element was separately encoded using 2D molecular descriptors derived from SMILES. Using a literature-sourced dataset of 152 PET-RAFT systems, we systematically trained (with fivefold cross-validation, CV) and evaluated 10 ML algorithms. CatBoost showed greater stability across CV-folds (SD = ± 0.07) and was identified as the top performer for monomer conversion prediction (R2 = 0.84; RMSE = 10.04 pps; MAE = 8.16 pps). SHapley Additive exPlanations (SHAP) analysis revealed mechanistically interpretable structure–property-performance relationships, highlighting that monomer topological complexity, electronic polarization, and molecular weight together account for over 60% of the model’s predictive power. External validation confirmed CatBoost’s ability to generalize to unseen (meth)acrylates and (meth)acrylamides (MAE = 8.03), with comparable performance to that of the training set. In practice, the learned descriptor-conversion mapping enables fast in silico screening and component ranking, highlighting actionable descriptor ranges and potentially accelerating design-build-test cycles for high-conversion PET-RAFT.

The online version contains supplementary material available at 10.1038/s41598-025-33553-y.

## Full-text entities

- **Genes:** CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}
- **Diseases:** RAFT (MESH:D012892)
- **Chemicals:** N, N-dimethylacrylamide (MESH:C099046), acrylates (MESH:D000179), metalloporphyrin (MESH:D008665), zinc(II) tetraphenylporphyrin (MESH:C048955), nitroxide (MESH:C039900), SR (MESH:D013324), ZnTPP (MESH:C076448), DMSO (MESH:D004121), BzMA (MESH:C512584), (meth)acrylate (MESH:D008689), MA (MESH:C035956), HMPA (MESH:D006492), 4-cyano-4-[(dodecylsulfanylthiocarbonyl)sulfanyl]pentanoic acid (MESH:C000709520), 2-hydroxypropyl methacrylamide (MESH:C032976), (meth)acrylamide (MESH:C045985), S (MESH:D013455), MMA (MESH:D020366), 2-(butylthiocarbonothioylthio) propionic acid (-), trithiocarbonates (MESH:C013321), acrylamides (MESH:D000178), amide (MESH:D000577), N-methylacrylamide (MESH:C491267), acrylamide (MESH:D020106), xanthates (MESH:C004918), zinc (MESH:D015032), oxygen (MESH:D010100), acrylate (MESH:C036658), metal (MESH:D008670), polymer (MESH:D011108), C (MESH:D002244), HPMA (MESH:C032802), ester (MESH:D004952)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12894868/full.md

---
Source: https://tomesphere.com/paper/PMC12894868