MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts
Naoto Iwase, Hiroki Okuyama, Junichiro Iwasawa

TL;DR
MedRECT is a novel cross-lingual benchmark for evaluating and improving large language models' ability to detect, localize, and correct errors in clinical texts across English and Japanese, advancing safe medical AI deployment.
Contribution
This paper introduces MedRECT, the first comprehensive cross-lingual benchmark for medical error correction, with scalable data generation and evaluation of diverse LLMs, including fine-tuning methods.
Findings
Reasoning models outperform standard architectures in error detection and localization.
Cross-lingual evaluation shows 5-10% performance gaps between English and Japanese.
Fine-tuning improves error correction, surpassing human experts in structured tasks.
Abstract
Large language models (LLMs) show increasing promise in medical applications, but their ability to detect and correct errors in clinical texts -- a prerequisite for safe deployment -- remains under-evaluated, particularly beyond English. We introduce MedRECT, a cross-lingual benchmark (Japanese/English) that formulates medical error handling as three subtasks: error detection, error localization (sentence extraction), and error correction. MedRECT is built with a scalable, automated pipeline from the Japanese Medical Licensing Examinations (JMLE) and a curated English counterpart, yielding MedRECT-ja (663 texts) and MedRECT-en (458 texts) with comparable error/no-error balance. We evaluate 9 contemporary LLMs spanning proprietary, open-weight, and reasoning families. Key findings: (i) reasoning models substantially outperform standard architectures, with up to 13.5% relative improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare
