TL;DR
ConRetroBert introduces a dual encoder framework with EMA stabilization for improved template-based retrosynthesis, significantly enhancing top-1 accuracy and handling rare templates effectively.
Contribution
It reframes template prediction as dense retrieval and ranking, employing contrastive pretraining and EMA stabilization to outperform existing methods.
Findings
Top-1 accuracy improved from 50.5% to 62.4% on USPTO-50k.
EMA stabilization further boosts accuracy by 0.9%.
Retrieval approach is effective for rare templates and alternative reactant predictions.
Abstract
Template based single step retrosynthesis predicts reactants by selecting and applying an explicit reaction template, making each prediction traceable to a chemical transformation rule. This is useful for synthesis planning, but template based methods are often viewed as less competitive than template free models because template prediction is commonly formulated as global classification over a long tailed rule library. We argue that this weakness is not inherent to templates, but to the learning formulation. We present ConRetroBert, a dual encoder framework that reframes template based retrosynthesis as dense product template retrieval followed by candidate set listwise ranking. Stage 1 uses contrastive pretraining to learn a shared embedding space between products and reaction templates. Stage 2 refines template ranking over mined hard negative candidate sets with a multi positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
