# GiantHunter: accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search

**Authors:** Fuchuan Qu, Cheng Peng, Jiaojiao Guan, Donglin Wang, Yanni Sun, Jiayu Shang

PMC · DOI: 10.1093/bioinformatics/btaf239 · 2025-07-15

## TL;DR

GiantHunter is a new tool that uses reinforcement learning to detect giant viruses in metagenomic data, improving accuracy and efficiency.

## Contribution

GiantHunter introduces a reinforcement learning approach with Monte Carlo tree search for improved NCLDV detection in metagenomic data.

## Key findings

- GiantHunter improves F1-score by 10% and reduces computational cost by 90% compared to existing methods.
- Application to Yangtze River datasets revealed NCLDV diversity differences linked to the Three Gorges Dam.

## Abstract

Nucleocytoplasmic large DNA viruses (NCLDVs) are notable for their large genomes and extensive gene repertoires, which contribute to their widespread environmental presence and critical roles in processes such as host metabolic reprogramming and nutrient cycling. Metagenomic sequencing has emerged as a powerful tool for uncovering novel NCLDVs in environmental samples. However, identifying NCLDV sequences in metagenomic data remains challenging due to their high genomic diversity, limited reference genomes, and shared regions with other microbes. Existing alignment-based and machine learning methods struggle with achieving optimal trade-offs between sensitivity and precision.

In this work, we present GiantHunter, a reinforcement learning-based tool for identifying NCLDVs from metagenomic data. By employing a Monte Carlo tree search strategy, GiantHunter dynamically selects representative non-NCLDV sequences as the negative training data, enabling the model to establish a robust decision boundary. Benchmarking on rigorously designed experiments shows that GiantHunter achieves high precision while maintaining competitive sensitivity, improving the F1-score by 10% and reducing computational cost by 90% compared to the second-best method. To demonstrate its real-world utility, we applied GiantHunter to 60 metagenomic datasets collected from six cities along the Yangtze River, located both upstream and downstream of the Three Gorges Dam. The results reveal significant differences in NCLDV diversity correlated with proximity to the dam, likely influenced by reduced flow velocity caused by the dam. These findings highlight GiantHunter’s potential to advance our understanding of NCLDVs and their ecological roles in diverse environments.

The source code of GiantHunter is available via: https://github.com/FuchuanQu/GiantHunter.

## Full-text entities

- **Genes:** PC (pyruvate carboxylase) [NCBI Gene 5091] {aka PCB}, POLB (DNA polymerase beta) [NCBI Gene 5423], CAPG (capping actin protein, gelsolin like) [NCBI Gene 822] {aka AFCP, HEL-S-66, MCP}, NR2E3 (nuclear receptor subfamily 2 group E member 3) [NCBI Gene 10002] {aka ESCS, ESCS1, PNR, RNR, RP37, rd7}, RNGTT (RNA guanylyltransferase and 5'-phosphatase) [NCBI Gene 8732] {aka CAP1A, HCE, HCE1, hCAP}
- **Diseases:** PC (MESH:D003027), giant (MESH:D005870), NCLDVs (MESH:D004266)
- **Chemicals:** MCTS (-), carbon (MESH:D002244)
- **Species:** Homo sapiens (human, species) [taxon 9606], Bacteriophage sp. (species) [taxon 38018], Mimivirus (genus) [taxon 315393]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12261416/full.md

---
Source: https://tomesphere.com/paper/PMC12261416