Scientific Large Language Models: A Survey on Biological & Chemical Domains
Qiang Zhang, Keyang Ding, Tianwen Lyv, Xinda Wang, Qingyu Yin, Yiwen, Zhang, Jing Yu, Yuhao Wang, Xiaotong Li, Zhuoyi Xiang, Kehua Feng, Xiang, Zhuang, Zeyuan Wang, Ming Qin, Mengyao Zhang, Jinlu Zhang, Jiyu Cui, Tao, Huang, Pengju Yan, Renjun Xu, Hongyang Chen, Xiaolin Li

TL;DR
This survey reviews recent advancements in scientific large language models specifically tailored for biological and chemical domains, highlighting their architectures, datasets, capabilities, challenges, and future research directions.
Contribution
It provides the first comprehensive, systematic overview of scientific LLMs in biology and chemistry, detailing technical developments and identifying key challenges and opportunities.
Findings
Analyzed various model architectures and datasets used in scientific LLMs.
Identified key challenges such as data scarcity and model interpretability.
Outlined promising future research directions in scientific LLM development.
Abstract
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent of scientific LLMs, a novel subclass specifically engineered for facilitating scientific discovery. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration. However, a systematic and up-to-date survey introducing them is currently lacking. In this paper, we endeavor to methodically delineate the concept of "scientific language", whilst providing a thorough review of the latest advancements in scientific LLMs. Given the expansive realm of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling
