SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning
Shubham Mohole, Sainyam Galhotra

TL;DR
SIFOTL is a novel, privacy-preserving method for identifying data shift factors in tabular healthcare datasets, using summary statistics and twin models to outperform existing approaches in noisy, real-world scenarios.
Contribution
The paper introduces SIFOTL, a new approach that combines privacy-safe summaries, twin XGBoost models, and Pareto decision trees for robust, interpretable shift analysis in tabular data.
Findings
Achieves high F1 scores (0.85-0.96) in real-world healthcare datasets.
Outperforms existing methods like BigQuery Contribution Analysis and statistical tests.
Remains effective with observational noise, maintaining F1 >= 0.75.
Abstract
Identifying the factors driving data shifts in tabular datasets is a significant challenge for analysis and decision support systems, especially those focusing on healthcare. Privacy rules restrict data access, and noise from complex processes hinders analysis. To address this challenge, we propose SIFOTL (Statistically-Informed Fidelity-Optimization Method for Tabular Learning) that (i) extracts privacy-compliant data summary statistics, (ii) employs twin XGBoost models to disentangle intervention signals from noise with assistance from LLMs, and (iii) merges XGBoost outputs via a Pareto-weighted decision tree to identify interpretable segments responsible for the shift. Unlike existing analyses which may ignore noise or require full data access for LLM-based analysis, SIFOTL addresses both challenges using only privacy-safe summary statistics. Demonstrating its real-world efficacy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization
