Leveraging Large Language Models for enzymatic reaction prediction and characterization
Lorenzo Di Fruscia, Jana Marie Weber

TL;DR
This paper evaluates the effectiveness of large language models, especially Llama-3.1, in predicting and characterizing enzymatic reactions, showing their potential and limitations in biochemical tasks.
Contribution
It systematically assesses LLMs for enzymatic reaction prediction, employing multitask fine-tuning and analyzing performance in low-data scenarios, revealing their capabilities and challenges.
Findings
Fine-tuned LLMs effectively capture biochemical knowledge.
Multitask learning improves enzyme reaction predictions.
Challenges remain in hierarchical enzyme classification schemes.
Abstract
Predicting enzymatic reactions is crucial for applications in biocatalysis, metabolic engineering, and drug discovery, yet it remains a complex and resource-intensive task. Large Language Models (LLMs) have recently demonstrated remarkable success in various scientific domains, e.g., through their ability to generalize knowledge, reason over complex structures, and leverage in-context learning strategies. In this study, we systematically evaluate the capability of LLMs, particularly the Llama-3.1 family (8B and 70B), across three core biochemical tasks: Enzyme Commission number prediction, forward synthesis, and retrosynthesis. We compare single-task and multitask learning strategies, employing parameter-efficient fine-tuning via LoRA adapters. Additionally, we assess performance across different data regimes to explore their adaptability in low-data settings. Our results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
