TL;DR
GDC Cohort Copilot is an open-source AI tool that helps users create cancer genomics cohorts from natural language descriptions, improving accessibility and efficiency in cohort curation.
Contribution
We developed and evaluated a locally-served large language model that outperforms GPT-4o in generating GDC cohorts from natural language descriptions.
Findings
Locally-served GDC Cohort LLM outperforms GPT-4o in cohort generation.
Open-source GDC Cohort Copilot is available as a containerized app.
The tool simplifies cohort creation from natural language for GDC users.
Abstract
The Genomic Data Commons (GDC) provides access to high quality, harmonized cancer genomics data through a unified curation and analysis platform centered around patient cohorts. While GDC users can interactively create complex cohorts through the graphical Cohort Builder, users (especially new ones) may struggle to find specific cohort descriptors across hundreds of possible fields and properties. However, users may be better able to describe their desired cohort in free-text natural language. We introduce GDC Cohort Copilot, an open-source copilot tool for curating cohorts from the GDC. GDC Cohort Copilot automatically generates the GDC cohort filter corresponding to a user-input natural language description of their desired cohort, before exporting the cohort back to the GDC for further analysis. An interactive user interface allows users to further refine the generated cohort. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
