An Empirical Analysis of Fine-Tuning Large Language Models on Bioinformatics Literature: PRSGPT and BioStarsGPT

Muhammad Muneeb; David B. Ascher

arXiv:2601.11573·cs.CL·January 21, 2026

An Empirical Analysis of Fine-Tuning Large Language Models on Bioinformatics Literature: PRSGPT and BioStarsGPT

Muhammad Muneeb, David B. Ascher

PDF

Open Access

TL;DR

This paper introduces a reproducible pipeline for fine-tuning large language models on bioinformatics data, demonstrated through two specialized models, PRSGPT and BioStarsGPT, achieving improved performance and rich datasets for domain-specific applications.

Contribution

The paper presents a novel, scalable pipeline for domain-specific fine-tuning of LLMs using diverse bioinformatics data sources and prompt-based QA generation, with extensive benchmarking and human evaluation.

Findings

01

Qwen2.5-7B outperformed other models on benchmarks.

02

PRSGPT achieved 61.9% accuracy in human evaluation.

03

BioStarsGPT demonstrated 59% conceptual accuracy.

Abstract

Large language models (LLMs) often lack specialized knowledge for complex bioinformatics applications. We present a reproducible pipeline for fine-tuning LLMs on specialized bioinformatics data, demonstrated through two use cases: PRSGPT, focused on polygenic risk score (PRS) tools, and BioStarsGPT, trained on community forum discussions. The nine-step pipeline integrates diverse data sources, structured preprocessing, prompt-based question-answer (QA) generation (via Google Gemini), natural language inference (NLI) for quality control, semantic deduplication, clustering-based data splitting, and parameter-efficient fine-tuning using LoRA. We fine-tuned three LLMs (LLaMA-3.2-3B, Qwen2.5-7B, Gemma) and benchmarked them on over 14 lexical and semantic metrics. Qwen2.5-7B emerged as the best performer, with BLEU-4 and ROUGE-1 improvements of 82\% and 70\% for PRSGPT and 6\% and 18\% for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare