The Radiation Oncology NLP Database
Zhengliang Liu, Jason Holmes, Wenxiong Liao, Chenbin Liu, Lian Zhang,, Hongying Feng, Peilong Wang, Muhammad Ali Elahi, Hongmin Cai, Lichao Sun,, Quanzheng Li, Xiang Li, Tianming Liu, Jiajian Shen, Wei Liu

TL;DR
The Radiation Oncology NLP Database (ROND) is a specialized dataset for NLP tasks in radiation oncology, supporting research with diverse tasks and a large instruction-tuning dataset, including a trained large language model, CancerChat.
Contribution
ROND is the first dedicated NLP dataset for radiation oncology, enabling domain-specific NLP research and development with multiple tasks and a large instruction-tuning dataset.
Findings
Developed a comprehensive radiation oncology NLP dataset (ROND)
Created an instruction-tuning dataset with over 20k pairs
Trained a large language model, CancerChat, demonstrating domain-specific NLP capabilities
Abstract
We present the Radiation Oncology NLP Database (ROND), the first dedicated Natural Language Processing (NLP) dataset for radiation oncology, an important medical specialty that has received limited attention from the NLP community in the past. With the advent of Artificial General Intelligence (AGI), there is an increasing need for specialized datasets and benchmarks to facilitate research and development. ROND is specifically designed to address this gap in the domain of radiation oncology, a field that offers many opportunities for NLP exploration. It encompasses various NLP tasks including Logic Reasoning, Text Classification, Named Entity Recognition (NER), Question Answering (QA), Text Summarization, and Patient-Clinician Conversations, each with a distinct focus on radiation oncology concepts and application cases. In addition, we have developed an instruction-tuning dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsFocus
