# HTA - An open-source software for assigning head and tail positions to monomer SMILES in polymerization reactions

**Authors:** Brenda de Souza Ferrari, Ronaldo Giro, Mathias B. Steiner

PMC · DOI: 10.1186/s13321-025-01098-x · Journal of Cheminformatics · 2025-10-28

## TL;DR

This paper introduces HTA, an open-source tool that accurately assigns head and tail positions in monomer SMILES strings to improve polymer informatics and AI predictions.

## Contribution

The novel HTA algorithm automates head and tail atom assignment in monomer SMILES using nucleophilicity-based reactivity analysis.

## Key findings

- HTA achieved 99% accuracy in predicting polymer classes for 204 out of 206 monomer SMILES.
- The algorithm correctly assigned head and tail atoms in 91% of 206 monomer SMILES.
- HTA was successfully applied to data pre-processing for polymerization reaction modeling.

## Abstract

Artificial Intelligence (AI) techniques are transforming the computational discovery and design of polymers. The key enablers for polymer informatics are machine-readable molecular string representations of the building blocks of a polymer, i.e., the monomers. In monomer strings, such as SMILES, symbols at the head and tail atoms indicate the locations of bond formation during polymerization. Since the linking of monomers determines a polymer’s properties, the performance of AI prediction models will, ultimately, be limited by the accuracy of the head and tail assignments in the monomer SMILES. Considering the large number of polymer precursors available in chemical data bases, reliable methods for the automated assignment of head and tail atoms are needed. Here, we report a method for assigning head and tail atoms in monomer SMILES by analyzing the reactivity of their functional groups based on the atomic index of nucleophilicity. In a reference data set containing 206 polymer precursors, the HeadTailAssign (HTA) algorithm correctly predicted the polymer class of 204 monomer SMILES, achieving an accuracy of 99%. The head and tail atoms were correctly assigned to 187 monomer SMILES, representing an accuracy of 91%. The HTA code is available for validation and reuse at https://github.com/IBM/HeadTailAssign.

The algorithm was successfully applied to data pre-processing by tagging the linkage bonds in monomers for defining the repeat units in polymerization reactions.

## Full-text entities

- **Chemicals:** polymer (MESH:D011108)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12570827/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12570827/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12570827/full.md

---
Source: https://tomesphere.com/paper/PMC12570827