# Machine Learning Prediction of Laccase‐Catalyzed Oxidation of Aromatic Compounds Using Curated Enzyme‐Specific Datasets

**Authors:** Yulia Kulagina, Christian Goldhahn, Ramon Weishaupt, Mark Schubert

PMC · DOI: 10.1002/jcc.70344 · 2026-03-06

## TL;DR

This paper uses machine learning to predict which aromatic compounds can be oxidized by laccase enzymes, helping to speed up green chemistry experiments.

## Contribution

The study introduces a machine learning framework with interpretable models and a visualization tool for predicting laccase-substrate compatibility.

## Key findings

- Random forest models showed consistent performance across different laccase datasets.
- ChemBERTa attention analysis identified molecular features linked to oxidation outcomes.
- A lightweight tool was developed to visualize predictions on molecular structures.

## Abstract

Laccases are multi‐copper oxidase enzymes that oxidize a wide range of aromatic and non‐aromatic compounds using molecular oxygen, producing water as the sole byproduct and making them attractive biocatalysts for green chemistry. However, the ability of laccases to oxidize specific substrates depends on a complex interplay of molecular structure, enzyme properties, redox potential, and environmental context, making laccase–substrate compatibility hard to predict. We apply machine learning models to pre‐screen laccase–substrate combinations, streamlining experimental workflows. We evaluate four classical classifiers and a transformer‐based model (ChemBERTa) on three in‐house curated datasets of aromatic substrates with oxidation profiles for distinct laccases. Overall, the tested models achieve comparable performance, with random forest (RFC) demonstrating more stability across different data splits and laccases. This assessment is complemented by RFC feature‐importance and ChemBERTa attention analyses, which highlight molecular features associated with oxidation outcomes. We also introduce a lightweight tool to visualize ChemBERTa predictions by mapping SMILES attributions onto molecular graphs. These findings provide a robust, interpretable framework for accelerating laccase–substrate discovery.

We curate laccase‐substrate datasets and train five classifiers, from regularized logistic regression to tree‐based models and ChemBERTa, to predict whether a substrate will be oxidized. Feature importance and attention maps projected onto molecular substructures make the predictions interpretable and useful for pre‐screening before the bench.

## Linked entities

- **Chemicals:** molecular oxygen (PubChem CID 977), water (PubChem CID 962)

## Full-text entities

- **Chemicals:** ether (MESH:D004986), alcohols (MESH:D000438), triazoles (MESH:D014230), lignin (MESH:D008031), phenols (MESH:D010636), amines (MESH:D000588), Aromatic Compounds (-), aldehydes (MESH:D000447), copper (MESH:D003300), indoles (MESH:D007211), imidazoles (MESH:D007093), benzene (MESH:D001554), water (MESH:D014867), carbon (MESH:D002244), ketones (MESH:D007659), sulfonamides (MESH:D013449), nitrogen (MESH:D009584), oxygen (MESH:D010100), -mth (MESH:D008926)
- **Species:** Thermothelomyces thermophilus (species) [taxon 78579], Homo sapiens (human, species) [taxon 9606], Bacillus pumilus (species) [taxon 1408], Escherichia coli (E. coli, species) [taxon 562], Trametes versicolor (turkey-tail fungus, species) [taxon 5325]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12966443/full.md

---
Source: https://tomesphere.com/paper/PMC12966443