Mixing Configurations for Downstream Prediction
Juntang Wang, Hao Wu, Runkun Guo, Yihan Wang, Dongmian Zou, Shixin Xu

TL;DR
This paper introduces GraMixC, a novel module that extracts and aligns hierarchical configurations in data, significantly enhancing downstream prediction performance across various tasks.
Contribution
The paper formally characterizes configurations in clustering and vision transformers, and proposes GraMixC, a plug-and-play module that improves prediction accuracy by leveraging these configurations.
Findings
GraMixC boosts R2 score from 0.6 to 0.9 in rRNA prediction.
It outperforms static-feature baselines on tabular benchmarks.
Configurations reduce redundancy and improve task-specific learning.
Abstract
Humans possess an innate ability to group objects by similarity, a cognitive mechanism that clustering algorithms aim to emulate. Recent advances in community detection have enabled the discovery of configurations -- valid hierarchical clusterings across multiple resolution scales -- without requiring labeled data. In this paper, we formally characterize these configurations and identify similar emergent structures in register tokens within Vision Transformers. Unlike register tokens, configurations exhibit lower redundancy and eliminate the need for ad hoc selection. They can be learned through unsupervised or self-supervised methods, yet their selection or composition remains specific to the downstream task and input. Building on these insights, we introduce GraMixC, a plug-and-play module that extracts configurations, aligns them using our Reverse Merge/Split (RMS) technique, and…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The proposed approach is novel and is well presented - The evaluation experiments are extensive and well documented - Performance on the bacterial species prediction task is demonstrably improved
- The proposed hierarchical configuration learning task is not well motivated or defined. The evaluation protocol does not seem to be directly corresponding to the stated task of configuration learning. The main example is in predicting bacterial species. Perhaps the whole approach is a better method for the specific domain rather than a general-purpose method. - The module seems to be evaluated in a somewhat ad hoc manner. For example, in Figure 1, bird is clustered with horse and deer but cat
- The paper is clear about its motivation with sufficient significance and quality. - The paper introduces a creative idea, mixing multi-resolution cluster configurations to capture global manifold structure in data. - GraMixC can be easily attached to existing predictors, which makes it broadly applicable. - The alignment procedure (RMS) and the configuration formulation are mathematically detailed, showing rigor behind the method. - The attention maps, ablation studies, and qualitative example
- The paper is quite dense. Many sections, especially those describing the clustering and RMS steps, are mathematically overloaded and could use higher-level intuition or easy separation for better understanding. - The method largely builds upon existing clustering and attention mechanisms; the main novelty mainly lies in combining and aligning them. - The paper does not clearly quantify the computational overhead of multi-resolution clustering and RMS alignment compared to single-resolution or
The authors demonstrate that the method can help improve the performance of several prediction models on multiple datasets. The concept of using unsupervised learning to improve supervised tasks is interesting (although not new).
The paper is poorly written, hard to follow, and many details about the evaluations are unclear. The motivation isn’t explained, and the flow of the abstract and intro is bad. The paper's main message is convoluted and poorly conveyed. It is also not clear what in the paper is new, and what is a combination of existing methods. The authors don’t explain the computational overhead of this method compared with simply applying the downstream classifier/regressor. Same for training. The training p
- Despite clarity concerns noted below, the paper proposes an interesting and useful paradigm for integrating clustering results into downstream prediction models. - Existing experiments appear to be well executed with a good mix of synthetic, benchmark, and realistic datasets. Performance improvements seem strong, although need to be properly contextualized.
This paper is written with insufficient clarity for the generalist ICLR audience, in my opinion. Overall, the inaccessibility of the manuscript prevents a fully informed review that is reflected in my confidence score. Detailed issues include: - The paper assumes deep familiarity with community detection literature (BlueRed Front, modularity-based clustering, parallel-DT) without providing a background or related works section. For example, the term "configuration" is only defined in Section 3.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Complex Network Analysis Techniques
