Semantic Risk Scoring of Aggregated Metrics: An AI-Driven Approach for Healthcare Data Governance

Mohammed Omer Shakeel Ahmed

arXiv:2603.07924·cs.LG·March 10, 2026

Semantic Risk Scoring of Aggregated Metrics: An AI-Driven Approach for Healthcare Data Governance

Mohammed Omer Shakeel Ahmed

PDF

Open Access

TL;DR

This paper introduces an AI-driven framework that evaluates the privacy risks of healthcare data metrics by analyzing SQL queries to prevent overexposure, ensuring compliance and enabling secure data sharing.

Contribution

It presents a novel static, explainable risk scoring system for SQL-based healthcare metrics using semantic and syntactic analysis with pretrained embeddings and machine learning.

Findings

01

High accuracy in risk detection (>85%)

02

Effective flagging of sensitive query patterns

03

Supports privacy-preserving healthcare data governance

Abstract

Large healthcare institutions typically operate multiple business intelligence (BI) teams segmented by domain, including clinical performance, fundraising, operations, and compliance. Due to HIPAA, FERPA, and IRB restrictions, these teams face challenges in sharing patient-level data needed for analytics. To mitigate this, A metric aggregation table is proposed, which is a precomputed, privacy-compliant summary. These abstractions enable decision-making without direct access to sensitive data. However, even aggregated metrics can inadvertently lead to privacy risks if constructed without rigorous safeguards. A modular AI framework is proposed that evaluates SQL-based metric definitions for potential overexposure using both semantic and syntactic features. Specifically, the system parses SQL queries into abstract syntax trees (ASTs), extracts sensitive patterns (e.g., fine-grained GROUP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Access Control and Trust