Functional Post-Clustering Selective Inference with Applications to EHR   Data Analysis

Zihan Zhu; Xin Gai; Anru R. Zhang

arXiv:2405.03042·stat.ME·May 7, 2024

Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis

Zihan Zhu, Xin Gai, Anru R. Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel statistical method for post-clustering analysis of EHR data that corrects for bias, ensuring valid inference in longitudinal health data studies.

Contribution

It extends classical selective inference to longitudinal data, providing theoretical guarantees and demonstrating effectiveness on real-world EHR datasets.

Findings

01

Reduces inflated type-I error in post-clustering analysis

02

Provides theoretical bounds on error rates

03

Shows improved inference accuracy on AKI EHR data

Abstract

In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post-clustering data often leads to an inflated type-I error. In this paper, we introduce a new statistical approach that adjusts for this bias when analyzing data collected over time. Our method extends classical selective inference methods for cross-sectional data to longitudinal data. We provide theoretical guarantees for our approach with upper bounds on the selective type-I and type-II errors. We apply the method to simulated data and real-world Acute Kidney Injury (AKI) EHR datasets, thereby…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

telvc/pmisf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Time Series Analysis and Forecasting