MedStruct-S: A Benchmark for Key Discovery, Key-Conditioned QA and Semi-Structured Extraction from OCR Clinical Reports
Yingyun Li, Yu Wang, Haiyang Qian

TL;DR
MedStruct-S is a benchmark designed to evaluate key discovery, key-conditioned question answering, and semi-structured extraction from OCR clinical reports, addressing real-world challenges like unknown keys and OCR noise.
Contribution
It introduces a new benchmark with real-world data and evaluates multiple models, revealing insights into their performance under challenging conditions.
Findings
Encoder-only models perform best for key-conditioned QA despite smaller size.
Fine-tuned decoder-only models achieve the strongest overall results.
The benchmark enables reliable comparison of models in semi-structured IE tasks.
Abstract
Semi-structured information extraction (IE) from OCR-derived clinical reports is crucial for efficiently reconstructing patients' longitudinal medical histories. In practice, this scenario commonly involves three tasks: (i) field-header (key) discovery, (ii) key-conditioned question answering (QA), and (iii) end-to-end key-value pair extraction. However, existing evaluations often under-model two factors: heterogeneous and incompletely known key representations, and OCR-induced noise. This makes it difficult to assess model robustness in real-world settings. We present MedStruct-S, a benchmark specifically designed to evaluate these tasks under unknown keys and OCR noise. MedStruct-S contains 3,582 annotated real-world clinical report pages. Using MedStruct-S, we benchmark two representative paradigms: encoder-only sequence labeling with post-processing and decoder-only structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
