TL;DR
The paper introduces SGRPO, a flexible framework for optimizing biomolecular generators to enhance both utility and diversity, effectively expanding the utility-diversity Pareto frontier across various design tasks.
Contribution
SGRPO provides a set-level diversity reward mechanism that decouples from specific generators or metrics, improving utility-diversity trade-offs in biomolecular design.
Findings
SGRPO expands the utility-diversity Pareto frontier in multiple design tasks.
It outperforms pretrained generators, GRPO, and memory-assisted GRPO in frontier-level metrics.
Set-level diversity rewards help preserve broader generation distribution coverage.
Abstract
Biomolecular generators are often adapted with reward feedback to improve task-specific utility, but pushing utility alone can concentrate generation on a narrow family of candidates. Maintaining diversity is difficult because sample diversity is a set-level property. We introduce Supergroup Relative Policy Optimization (SGRPO), a flexible GRPO-style framework that directly constructs rewards from set-level diversity. For each condition, SGRPO samples a supergroup of candidate sets, compares their diversity under the same condition, and redistributes the group diversity reward to individual rollouts through leave-one-out diversity contributions before combining it with rollout-level utility. This design decouples SGRPO from a particular generator, utility reward, or diversity metric, and allows instantiation with different GRPO-style approaches. We evaluate SGRPO on de novo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
