TL;DR
This paper introduces SDC-GBB, a scalable global optimization algorithm for constrained clustering that efficiently handles large datasets with pairwise constraints, guaranteeing optimality and avoiding heuristic failures.
Contribution
The paper presents a novel decomposable branch-and-bound framework that significantly improves scalability and guarantees global optimality in constrained clustering.
Findings
Handles datasets with up to 1.5 million samples.
Achieves an optimality gap of less than 3%.
Outperforms existing methods by 200-1500 times in scalability.
Abstract
Constrained clustering leverages limited domain knowledge to improve clustering performance and interpretability, but incorporating pairwise must-link and cannot-link constraints is an NP-hard challenge, making global optimization intractable. Existing mixed-integer optimization methods are confined to small-scale datasets, limiting their utility. We propose Sample-Driven Constrained Group-Based Branch-and-Bound (SDC-GBB), a decomposable branch-and-bound (BB) framework that collapses must-linked samples into centroid-based pseudo-samples and prunes cannot-link through geometric rules, while preserving convergence and guaranteeing global optimality. By integrating grouped-sample Lagrangian decomposition and geometric elimination rules for efficient lower and upper bounds, the algorithm attains highly scalable pairwise k-Means constrained clustering via parallelism. Experimental results…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The approach effectively combines mathematical rigor with computational efficiency, demonstrating potential scalability of the MP(mathematical programming)-based method to large datasets. 2. In general, the paper is easy to go through.
1. The proposed technique appears highly similar to [1], raising concerns about its originality and incremental contribution beyond existing work. 2. In prior research, must-link constraints can also be represented by representative points while preserving the problem's hardness.
Addresses a hard and relevant problem. The algorithmic framework is well motivated and mathematically sound. Most parts of experiments are extensive and the implementation seems professional. The writing quality is high overall.
The conceptual novelty is limited — most components (B&B structure, geometric elimination, decomposition) have appeared in previous global optimization literature. The claimed scalability relies heavily on large-scale hardware rather than clear algorithmic gains. Experimental comparisons are somewhat narrow, mostly against older baselines. The contribution feels more like a well-engineered variant than a new learning idea.
- The pseudo-sample reformulation provably exact transformation with Lemma 3.3 and Theorem 3.4 shows that such a replacement shifts the objective by a constant, with the global minimum preserved. - The method comes with a convergence proof that the BB scheme recovers the true optimum given exhaustive splitting. - Geometric distance bounds to eliminate sample-to-cluster assignments, which prune the BB search tree aggressively before branching. - SDC-GBB can handle huge datasets. The authors demon
While the idea is interesting, some weaknesses in the proposed work's methodology need clarification: - **Geometric Pruning Rules in High Cluster/High Dimensional Data** Branch-and-bound on cluster centers lives in an $mK$-dimensional continuous space. Geometric pruning rules are known to degrade in high dimensions or when K is relatively large; they potentially weaken the elimination rules. The paper provides no theoretical justification or experimental analysis for $K>3$ or higher-dimension
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
