SciDesignBench: Benchmarking and Improving Language Models for Scientific Inverse Design
David van Dijk, Ivan Vrkic

TL;DR
SciDesignBench is a comprehensive benchmark for scientific inverse design tasks, revealing current model limitations and demonstrating that simulator-feedback training can significantly improve success rates in complex design problems.
Contribution
The paper introduces SciDesignBench, a large benchmark for scientific inverse design, and proposes RLSF, a training recipe that enhances model performance on these tasks.
Findings
Zero-shot models achieve only 29% success rate.
Simulator feedback improves success, but effectiveness varies with horizon.
RLSF training boosts success rates by 8-17 percentage points.
Abstract
Many of the most important problems in science and engineering are inverse problems: given a desired outcome, find a design that achieves it. Evaluating whether a candidate meets the spec is often routine; a binding energy can be computed, a reactor yield simulated, a pharmacokinetic profile predicted. But searching a combinatorial design space for inputs that satisfy those targets is fundamentally harder. We introduce SciDesignBench, a benchmark of 520 simulator-grounded tasks across 14 scientific domains and five settings spanning single-shot design, short-horizon feedback, long-horizon refinement, and seed-design optimization. On the 10-domain shared-core subset, the best zero-shot model reaches only 29.0% success despite substantially higher parse rates. Simulator feedback helps, but the leaderboard changes with horizon: Sonnet 4.5 is strongest in one-turn de novo design, whereas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Topic Modeling
