# AskBeacon—performing genomic data exchange and analytics with natural language

**Authors:** Anuradha Wickramarachchi, Shakila Tonni, Sonali Majumdar, Sarvnaz Karimi, Sulev Kõks, Brendan Hosking, Jordi Rambla, Natalie A Twine, Yatish Jain, Denis C Bauer

PMC · DOI: 10.1093/bioinformatics/btaf079 · Bioinformatics · 2025-02-22

## TL;DR

AskBeacon allows users to ask genomic data questions in natural language and get secure, publication-ready insights.

## Contribution

AskBeacon introduces a secure natural language interface for querying genomic data using LLMs and the Beacon protocol.

## Key findings

- AskBeacon found autosomal markers occurred 1.4 times more in males with Parkinson’s disease than females.
- The system ensures genomic data is not directly exposed to LLMs, preventing leaks or falsification.
- Different LLMs and architectures were evaluated to optimize translation of research questions into Beacon queries.

## Abstract

Enabling clinicians and researchers to directly interact with global genomic data resources by removing technological barriers is vital for medical genomics. AskBeacon enables large language models (LLMs) to be applied to securely shared cohorts via the Global Alliance for Genomics and Health Beacon protocol. By simply “asking” Beacon, actionable insights can be gained, analyzed, and made publication-ready.

In the Parkinson's Progression Markers Initiative (PPMI), we use natural language to ask whether the sex-differences observed in Parkinson's disease are due to X-linked or autosomal markers. AskBeacon returns a publication-ready visualization showing that for PPMI the autosomal marker occurred 1.4 times more often in males with Parkinson’s disease than females, compared to no differences for the X-linked marker. We evaluate commercial and open-weight LLM models, as well as different architectures to identify the best strategy for translating research questions to Beacon queries. AskBeacon implements extensive safety guardrails to ensure that genomic data is not exposed to the LLM directly, and that generated code for data extraction, analysis and visualization process is sanitized and hallucination resistant, so data cannot be leaked or falsified.

AskBeacon is available at https://github.com/aehrc/AskBeacon.

Graphical abstract

## Linked entities

- **Diseases:** Parkinson's disease (MONDO:0005180)

## Full-text entities

- **Diseases:** hallucination (MESH:D006212), Parkinson's (MESH:D010300)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11889448/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC11889448/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC11889448/full.md

---
Source: https://tomesphere.com/paper/PMC11889448