MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models
Philippe Formont, Maxime Darrin, Ismail Ben Ayed, Pablo Piantanida

TL;DR
MolRGen is a new benchmark and verifier for training and evaluating reasoning large language models on de novo molecular generation, enabling reward computation without reference molecules.
Contribution
It introduces MolRGen, a scalable environment for training and assessing reasoning LLMs in molecular design, including a diversity-aware metric and fine-tuning methods.
Findings
Benchmarking shows models can generate diverse, high-scoring molecules.
Fine-tuning with GRPO improves model performance but reduces diversity.
MolRGen enables reward-based evaluation without reference molecules.
Abstract
Recent reasoning-based large language models have shown strong performance on tasks with verifiable outcomes, but their use in de novo molecular generation remains limited by the lack of training environments where rewards can be computed without reference molecules. We introduce MolRGen, a benchmark and molecular verifier for training and evaluating reasoning LLMs on de novo molecular generation. MolRGen contains approximately 4,500 protein-pocket targets, resulting in 50k multi-objective optimization prompts combining docking scores with molecular properties such as QED, synthetic accessibility, logP, and physicochemical descriptors. Unlike caption-based generation or molecule-editing benchmarks, MolRGen evaluates molecules proposed from scratch by computing rewards at generation time. We benchmark general-purpose and chemistry-specialized open-source LLMs and introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
