Large-scale data extraction from the UNOS organ donor documents
Marek Rychlik, Bekir Tanriover, Yan Han

TL;DR
This paper presents a method to extract and convert large-scale, heterogeneous PDF data from OPTN organ donor records into a comprehensive database, enabling analysis of over 15 years of US organ donation data.
Contribution
We developed a scalable approach to convert OPTN PDF documents into an analyzable database, covering all data from 2008 to 2022, which was previously inaccessible at this scale.
Findings
Built a large, analyzable database from 2022 OPTN data
Demonstrated the method's extension to all past and future data
Enabled large-scale analysis of organ donor information
Abstract
In this paper we focus on three major task: 1) discussing our methods: Our method captures a portion of the data in DCD flowsheets, kidney perfusion data, and Flowsheet data captured peri-organ recovery surgery. 2) demonstrating the result: We built a comprehensive, analyzable database from 2022 OPTN data. This dataset is by far larger than any previously available even in this preliminary phase; and 3) proving that our methods can be extended to all the past OPTN data and future data. The scope of our study is all Organ Procurement and Transplantation Network (OPTN) data of the USA organ donors since 2008. The data was not analyzable in a large scale in the past because it was captured in PDF documents known as ``Attachments'', whereby every donor's information was recorded into dozens of PDF documents in heterogeneous formats. To make the data analyzable, one needs to convert the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOrgan Donation and Transplantation · Renal and Vascular Pathologies · Organ Transplantation Techniques and Outcomes
MethodsFocus
