A frame semantic overview of NLP-based information extraction for cancer-related EHR notes
Surabhi Datta, Elmer V Bernstam, Kirk Roberts

TL;DR
This paper provides a comprehensive overview of NLP techniques used to extract structured cancer-related information from EHR notes, highlighting common frames and gaps for future research.
Contribution
It introduces a frame semantic organization of existing NLP methods for cancer information extraction from EHRs, identifying key areas and proposing a resource for future work.
Findings
Cancer diagnosis is the most common extracted frame.
Recent work emphasizes treatment and breast cancer diagnosis.
A resource of annotated cancer frames could benefit future research.
Abstract
Objective: There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. Methods: We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames and also where very low-level extraction methods were used. 79 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
