# Prompt-based bioinformatic pipeline generation for a multi-step metaviral workflow

**Authors:** Pengchong Ma, Haoze Zheng, Weijun Yi, Li Ma, Brandi Sigmon, Karrie A Weber, Gangqing Hu, Qiuming Yao

PMC · DOI: 10.1093/bioadv/vbaf308 · Bioinformatics Advances · 2025-11-27

## TL;DR

This paper shows that large language models can help create complex bioinformatics pipelines, especially for multi-step viral analysis workflows.

## Contribution

The study introduces a prompt-based method for generating and updating bioinformatic pipelines using LLMs, with a focus on multi-step workflows.

## Key findings

- ChatGPT-4, ChatGPT-5, Claude 4.5, and Gemini 2.5 consistently outperformed other LLMs in pipeline generation.
- Prompt engineering and including documentation improved performance, especially for newer tools.
- LLMs showed potential for both creating and updating pipelines using the proposed strategies.

## Abstract

The rapid evolution of bioinformatics tools and multi-step analytic procedure presents a challenge for building effective pipelines, particularly for researchers without extensive programming expertise. This study demonstrates that large language models (LLMs) hold strong potential for generating end-to-end bioinformatic pipelines through carefully crafted prompts, using a multi-step metaviral workflow as a representative example. Multiple LLMs were tested for their effectiveness, including OpenAI ChatGPT series, Anthropic Claude series, Google Gemini, Meta Llama, and DeepSeek.

Our results show that ChatGPT-4, ChatGPT-5, Claude 4.5, and Gemini 2.5 consistently outperform other LLMs in generating complete bioinformatic pipelines, with statistically significant success rates. These models also handle tool substitutions effectively. Simple prompt engineering and the inclusion of official documentation further enhance performance, especially for newer bioinformatic tools. While capabilities vary, all LLMs tested show potential for both pipeline generation and updates with our designed prompts and strategies.

All prompts are available in the paper. The examples are available in GitHub https://github.com/mpckkk/pBio.

## Full-text entities

- **Diseases:** hallucination (MESH:D006212), LLMs (MESH:D007806)
- **Chemicals:** CheckV (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Enterobacterales (order) [taxon 91347]
- **Cell lines:** GPT-4 — Homo sapiens (Human), Chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_SQ48)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12782108/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12782108/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12782108/full.md

---
Source: https://tomesphere.com/paper/PMC12782108