LLM StructCore: Schema-Guided Reasoning Condensation and Deterministic Compilation
Serhii Zabolotnii

TL;DR
This paper introduces a schema-guided, two-stage approach for filling clinical case report forms from notes, improving accuracy and robustness by decomposing the task into JSON summarization and deterministic compilation.
Contribution
It proposes a novel two-stage pipeline combining schema-guided reasoning with a deterministic compiler, addressing extreme sparsity and strict output constraints in clinical form filling.
Findings
Achieved macro-F1 scores of 0.6543 (EN) and 0.6905 (IT) on dev set.
Submitted English variant scored 0.63 on Codabench test.
Language-agnostic pipeline performs well across English and Italian.
Abstract
Automatically filling Case Report Forms (CRFs) from clinical notes is challenging due to noisy language, strict output contracts, and the high cost of false positives. We describe our CL4Health 2026 submission for Dyspnea CRF filling (134 items) using a contract-driven two-stage design grounded in Schema-Guided Reasoning (SGR). The key task property is extreme sparsity: the majority of fields are unknown, and official scoring penalizes both empty values and unsupported predictions. We shift from a single-step "LLM predicts 134 fields" approach to a decomposition where (i) Stage 1 produces a stable SGR-style JSON summary with exactly 9 domain keys, and (ii) Stage 2 is a fully deterministic, 0-LLM compiler that parses the Stage 1 summary, canonicalizes item names, normalizes predictions to the official controlled vocabulary, applies evidence-gated false-positive filters, and expands the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
