HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation
Youran Zhou, Mohamed Reda Bouadjenek, Jonathan Wells, Sunil Aryal

TL;DR
HI-PMK introduces a novel data-dependent kernel that effectively handles incomplete and heterogeneous data without imputation, improving performance in classification and clustering tasks across various missing data scenarios.
Contribution
The paper proposes HI-PMK, a new kernel that directly models incomplete heterogeneous data using probability mass-based dissimilarity and a missingness-aware uncertainty strategy, avoiding imputation.
Findings
Outperforms traditional imputation-based methods on 15 benchmark datasets.
Effectively handles MCAR, MAR, MNAR missingness mechanisms.
Scalable and privacy-preserving approach for real-world applications.
Abstract
Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical). Existing methods often rely on imputation, which may introduce bias or privacy risks, or fail to jointly address data heterogeneity and structured missingness. We propose the \textbf{H}eterogeneous \textbf{I}ncomplete \textbf{P}robability \textbf{M}ass \textbf{K}ernel (\textbf{HI-PMK}), a novel data-dependent representation learning approach that eliminates the need for imputation. HI-PMK introduces two key innovations: (1) a probability mass-based dissimilarity measure that adapts to local data distributions across heterogeneous features (numerical, ordinal, nominal), and (2) a missingness-aware uncertainty strategy (MaxU) that conservatively handles all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
