A distance for mixed-variable and hierarchical domains with meta variables
Edward Hall\'e-Hannan, Charles Audet, Youssef Diouane, S\'ebastien Le Digabel, Paul Saves

TL;DR
This paper introduces a new modeling framework and a novel distance measure for heterogeneous, hierarchical datasets with mixed variable types and meta variables, enhancing data utilization in machine learning tasks.
Contribution
It presents a generalized framework for hierarchical, mixed-variable domains and a new distance metric enabling comparison of heterogeneous data points, improving data integration.
Findings
Effective in regression and classification tasks
Allows use of entire heterogeneous datasets
Improves model performance with mixed-variable data
Abstract
Heterogeneous datasets emerge in various machine learning and optimization applications that feature different input sources, types or formats. Most models or methods do not natively tackle heterogeneity. Hence, such datasets are often partitioned into smaller and simpler ones, which may limit the generalizability or performance, especially when data is limited. The first main contribution of this work is a modeling framework that generalizes hierarchical, tree-structured, variable-size or conditional search frameworks. The framework models mixed-variable and hierarchical domains in which variables may be continuous, integer, or categorical, with some identified as meta when they influence the structure of the problem. The second main contribution is a novel distance that compares any pair of mixed-variable points that do not share the same variables, allowing to use whole heterogeneous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization
