A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials

Pratik Worah

arXiv:2506.10879·q-bio.QM·November 25, 2025

A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials

Pratik Worah

PDF

TL;DR

This paper introduces an efficient approximation algorithm inspired by Goemans-Williamson for identifying homogeneous subgroups within large, heterogeneous clinical datasets, aiding in personalized medicine.

Contribution

It presents a novel rounding technique that approximates optimal solutions within 0.82, specifically tailored for subgroup detection in clinical data analysis.

Findings

01

Successfully identified clinically relevant homogeneous subcohorts

02

Revealed potential links between gene expression and methylation levels

03

Demonstrated the algorithm's effectiveness on breast cancer dataset

Abstract

We design an efficient algorithm that outputs tests for identifying predominantly homogeneous subcohorts of patients from large in-homogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Wiliamson (1995), that approximates the optimal solution within a factor of $0.82$ . As an application, we use our algorithm to trade-off sensitivity for specificity to systematically identify clinically interesting homogeneous subcohorts of patients in the RNA microarray dataset for breast cancer from Curtis et al. (2012). One such clinically interesting subcohort suggests a link between LXR over-expression and BRCA2 and MSH6 methylation levels for patients in that subcohort.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.