Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps

Lee Harris

arXiv:2505.11451·cs.AI·June 4, 2025

Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps

Lee Harris

PDF

Open Access

TL;DR

This paper explores using regular expression synthesis to automatically extract complex date information from medical images, improving interpretability over black-box AI models.

Contribution

It introduces a novel approach of reverse-engineering UNIX timestamps to generate explainable regular expressions for date extraction in medical texts.

Findings

01

Manual regular expressions detect most real dates.

02

Automatically synthesized regexes detect fewer false positives.

03

The method enhances explainability in date extraction from medical images.

Abstract

Dates often contribute towards highly impactful medical decisions, but it is rarely clear how to extract this data. AI has only just begun to be used transcribe such documents, and common methods are either to trust that the output produced by a complex AI model, or to parse the text using regular expressions. Recent work has established that regular expressions are an explainable form of logic, but it is difficult to decompose these into the component parts that are required to construct precise UNIX timestamps. First, we test publicly-available regular expressions, and we found that these were unable to capture a significant number of our dates. Next, we manually created easily-decomposable regular expressions, and we found that these were able to detect the majority of real dates, but also a lot of sequences of text that look like dates. Finally, we used regular expression synthesis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare