Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models

Pranta Saha; Joyce Reimer; Brook Byrns; Connor Burbridge; Neeraj Dhar; Jeffrey Chen; Steven Rayan; Gordon Broderick

arXiv:2507.04432·q-bio.MN·July 8, 2025

Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models

Pranta Saha, Joyce Reimer, Brook Byrns, Connor Burbridge, Neeraj Dhar, Jeffrey Chen, Steven Rayan, Gordon Broderick

PDF

TL;DR

This study demonstrates that a small, domain-specific language model can accurately predict biological pathway interactions, improving understanding of intracellular processes with minimal data and active learning strategies.

Contribution

The paper introduces a novel approach using a very small BERT-based model with active learning to reconstruct biological pathways, reducing data requirements and enhancing accuracy.

Findings

01

Achieved over 80% accuracy in predicting molecular interactions.

02

Active learning with entropy-based selection improves model performance.

03

Small models can effectively contribute to biomedical pathway reconstruction.

Abstract

The use of generative artificial intelligence (AI) models is becoming ubiquitous in many fields. Though progress continues to be made, general purpose large language AI models (LLM) show a tendency to deliver creative answers, often called "hallucinations", which have slowed their application in the medical and biomedical fields where accuracy is paramount. We propose that the design and use of much smaller, domain and even task-specific LM may be a more rational and appropriate use of this technology in biomedical research. In this work we apply a very small LM by today's standards to the specialized task of predicting regulatory interactions between molecular components to fill gaps in our current understanding of intracellular pathways. Toward this we attempt to correctly posit known pathway-informed interactions recovered from manually curated pathway databases by selecting and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.