TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation

Jacob Ede Levine; Yun Lyan Luo; Sai Chandra Kosaraju

arXiv:2601.04521·cs.LG·January 19, 2026

TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation

Jacob Ede Levine, Yun Lyan Luo, Sai Chandra Kosaraju

PDF

Open Access

TL;DR

TSSR is a novel two-stage reinforcement learning framework that improves the validity, diversity, and chemical correctness of character-level SMILES molecule generation by combining syntax repair and chemistry-aware feedback.

Contribution

It introduces a two-stage reward-driven RL method that enhances molecular SMILES generation without task-specific labels or handcrafted grammars.

Findings

01

Significantly improves syntactic and chemical validity of generated molecules

02

Preserves drug-likeness and synthesizability while increasing diversity

03

Enhances molecule validity and novelty in both pure and fine-tuned RL settings

Abstract

The design of reliable, valid, and diverse molecules is fundamental to modern drug discovery, as improved molecular generation supports efficient exploration of the chemical space for potential drug candidates and reduces the cost of early design efforts. Despite these needs, current chemical language models that generate molecules as SMILES strings are vulnerable to compounding token errors: many samples are unparseable or chemically implausible, and hard constraints meant to prevent failure can restrict exploration. To address this gap, we introduce TSSR, a Two-Stage, Swap-Reward-driven reinforcement learning (RL) framework for character-level SMILES generation. Stage one rewards local token swaps that repair syntax, promoting transitions from invalid to parseable strings. Stage two provides chemistry-aware feedback from RDKit diagnostics, rewarding reductions in valence, aromaticity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Chemical Synthesis and Analysis