STRIDE: Post-Training LLMs to Reason and Refine Bio-Sequences via Edit Trajectories
Daiheng Zhang, Shiyang Zhang, Sizhuang He, Yangtian Zhang, Syed Asad Rizvi, and David van Dijk

TL;DR
STRIDE is a post-training framework that trains language models to generate verifiable edit trajectories for biological sequence refinement, significantly improving success rates and novelty in protein and molecular optimization tasks.
Contribution
It introduces a novel method combining supervised fine-tuning and policy optimization to enable LLMs to produce controllable, executable edit trajectories for bio-sequence refinement.
Findings
Protein editing success increased from 42% to 89%.
Novelty in edits increased from 47% to 97%.
Outperformed diverse baselines in validity and controllability.
Abstract
Discrete biological sequence optimization requires iterative refinement under strict syntactic constraints. Diffusion models offer progressive refinement but do not naturally expose controllable discrete edit operations, while autoregressive LLMs often lack explicit long-horizon planning for constrained edits. We propose STRIDE (Sequence Trajectory Refinement via Internalized Denoising Emulation), a post-training framework that trains an LLM to emit executable trajectories of atomic edits (INSERT/DELETE/REPLACE) as a verifiable reasoning trace for variable-length refinement. STRIDE combines supervised fine-tuning on Levenshtein-aligned shortest edit demonstrations with group-based policy optimization to align edit trajectories with task rewards while preserving coherent editing behavior. Across protein fluorescence and instruction-conditioned molecular optimization, STRIDE improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCRISPR and Genetic Engineering · Cell Image Analysis Techniques · RNA and protein synthesis mechanisms
