Problem-oriented AutoML in Clustering
Matheus Camilo da Silva, Gabriel Marques Tavares, Eric Medvet and, Sylvio Barbon Junior

TL;DR
PoAC is a flexible, problem-oriented AutoML framework for clustering that dynamically links problem context, validation indexes, and meta-features, using a surrogate model trained on extensive meta-knowledge to optimize solutions.
Contribution
It introduces a novel, adaptable AutoML approach that connects clustering problem specifics with validation and meta-features, surpassing traditional fixed-metric methods.
Findings
Outperforms state-of-the-art AutoML frameworks on multiple datasets
Effectively adapts to different clustering tasks without retraining
Excels in data visualization and dynamic pipeline adjustment
Abstract
The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast, PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components based on the specific context and goals of their task. At its core, PoAC employs a surrogate model trained on a large meta-knowledge base of previous clustering datasets and solutions, enabling it to infer the quality of new clustering pipelines and synthesize optimal solutions for unseen datasets. Unlike many AutoML frameworks that are constrained by fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications · Advanced Data Processing Techniques
MethodsBalanced Selection
