PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Ruheng Wang; Hang Zhang; Trieu Nguyen; Shasha Feng; Hao-Wei Pang; Xiang Yu; Li Xiao; Peter Zhiping Zhang

arXiv:2508.14765·cs.LG·March 30, 2026·3 cites

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang

PDF

TL;DR

PepThink-R1 is a novel LLM-based framework for interpretable cyclic peptide design that combines chain-of-thought reasoning with reinforcement learning to optimize multiple properties.

Contribution

It introduces a new approach integrating LLMs with explicit reasoning and RL for peptide optimization, enhancing interpretability and property control.

Findings

01

PepThink-R1 outperforms existing models in peptide property optimization.

02

The framework enables interpretable sequence modifications.

03

Generated peptides show improved lipophilicity, stability, and exposure.

Abstract

Designing therapeutic peptides with tailored properties is hindered by the vastness of sequence space, limited experimental data, and poor interpretability of current generative models. To address these challenges, we introduce PepThink-R1, a generative framework that integrates large language models (LLMs) with chain-of-thought (CoT) supervised fine-tuning and reinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitly reasons about monomer-level modifications during sequence generation, enabling interpretable design choices while optimizing for multiple pharmacological properties. Guided by a tailored reward function balancing chemical validity and property improvements, the model autonomously explores diverse sequence variants. We demonstrate that PepThink-R1 generates cyclic peptides with significantly enhanced lipophilicity, stability, and exposure, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.