Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps
Lee Harris

TL;DR
This paper explores using regular expression synthesis to automatically extract complex date information from medical images, improving interpretability over black-box AI models.
Contribution
It introduces a novel approach of reverse-engineering UNIX timestamps to generate explainable regular expressions for date extraction in medical texts.
Findings
Manual regular expressions detect most real dates.
Automatically synthesized regexes detect fewer false positives.
The method enhances explainability in date extraction from medical images.
Abstract
Dates often contribute towards highly impactful medical decisions, but it is rarely clear how to extract this data. AI has only just begun to be used transcribe such documents, and common methods are either to trust that the output produced by a complex AI model, or to parse the text using regular expressions. Recent work has established that regular expressions are an explainable form of logic, but it is difficult to decompose these into the component parts that are required to construct precise UNIX timestamps. First, we test publicly-available regular expressions, and we found that these were unable to capture a significant number of our dates. Next, we manually created easily-decomposable regular expressions, and we found that these were able to detect the majority of real dates, but also a lot of sequences of text that look like dates. Finally, we used regular expression synthesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare
