TL;DR
This paper introduces PolyBench, a large-scale benchmark dataset for polymer design tasks, and a knowledge-augmented reasoning distillation method to improve LLMs' reasoning about polymers.
Contribution
It provides a comprehensive dataset and a novel knowledge-augmented training method to enhance LLMs' capabilities in polymer science.
Findings
Small LLMs trained on PolyBench outperform similar-sized models.
Models trained on PolyBench perform well on external polymer benchmarks.
PolyBench enables better generalization and diagnostic testing for polymer reasoning.
Abstract
Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs are ineffective in this problem space because: (i) most models lack polymer-specific knowledge, and (ii) existing aligned models have limited coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large-scale training and test benchmark dataset of more than 125K polymer design-related tasks, leveraging a knowledge base of more than 13 million data points obtained from experimental and synthetic data sources to ensure broad coverage of polymers and their properties. For effective alignment using PolyBench, we introduce a knowledge-augmented reasoning distillation method that augments this dataset with structured CoT. Furthermore, tasks in PolyBench are organized from simple to complex analytical reasoning problems,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
