Composite Silhouette: A Subsampling-based Aggregation Strategy

Aggelos Semoglou; Aristidis Likas; John Pavlopoulos

arXiv:2604.13816·cs.LG·April 16, 2026

Composite Silhouette: A Subsampling-based Aggregation Strategy

Aggelos Semoglou, Aristidis Likas, John Pavlopoulos

PDF

TL;DR

The paper introduces Composite Silhouette, a new internal validation metric that combines micro- and macro-averaged Silhouette scores through subsampling to improve cluster number estimation.

Contribution

It proposes a subsampling-based aggregation strategy that balances micro- and macro-averaging biases for better cluster count determination.

Findings

01

Composite Silhouette outperforms traditional methods in synthetic datasets.

02

The method provides finite-sample guarantees for its estimates.

03

Experiments show improved accuracy in real-world data.

Abstract

Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard micro-averaged form tends to favor larger clusters under size imbalance. Macro-averaging mitigates this bias by weighting clusters equally, but may overemphasize noise from under-represented groups. We introduce Composite Silhouette, an internal criterion for cluster-count selection that aggregates evidence across repeated subsampled clusterings rather than relying on a single partition. For each subsample, micro- and macro-averaged Silhouette scores are combined through an adaptive convex weight determined by their normalized discrepancy and smoothed by a bounded nonlinearity; the final score is then obtained by averaging these subsample-level composites. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.