Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR

Jifan Gao; Michael Rosenthal; Brian Wolpin; Simona Cristea

arXiv:2511.00782·cs.AI·November 4, 2025

Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR

Jifan Gao, Michael Rosenthal, Brian Wolpin, Simona Cristea

PDF

Open Access

TL;DR

This study benchmarks count-based models against transformer and LLM pipelines for structured EHR prediction, finding count-based methods remain competitive due to their simplicity and interpretability.

Contribution

It provides a direct comparison of count-based, transformer, and LLM pipeline methods on EHR data, highlighting the continued strength of count-based approaches.

Findings

01

Count-based models perform competitively with LLM pipelines.

02

No single method dominates across all tasks.

03

Count-based models offer simplicity and interpretability.

Abstract

Structured electronic health records (EHR) are essential for clinical prediction. While count-based learners continue to perform strongly on such data, no benchmarking has directly compared them against more recent mixture-of-agents LLM pipelines, which have been reported to outperform single LLMs in various NLP tasks. In this study, we evaluated three categories of methodologies for EHR prediction using the EHRSHOT dataset: count-based models built from ontology roll-ups with two time bins, based on LightGBM and the tabular foundation model TabPFN; a pretrained sequential transformer (CLMBR); and a mixture-of-agents pipeline that converts tabular histories to natural-language summaries followed by a text classifier. We assessed eight outcomes using the EHRSHOT dataset. Across the eight evaluation tasks, head-to-head wins were largely split between the count-based and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Electronic Health Records Systems