CLUENet: Cluster Attention Makes Neural Networks Have Eyes
Xiangshuai Song, Jun-Jie Huang, Tianrui Liu, Ke Liang, Chang Tang

TL;DR
CLUENet introduces a transparent neural network architecture that combines clustering and attention mechanisms to improve interpretability, accuracy, and efficiency in visual semantic understanding tasks.
Contribution
The paper presents novel clustering-based attention modules and strategies that enhance model transparency and performance over existing methods.
Findings
Outperforms existing clustering and visual models on CIFAR-100 and Mini-ImageNet
Achieves a better balance of accuracy, efficiency, and interpretability
Enhances local modeling with temperature-scaled cosine attention and gated residuals
Abstract
Despite the success of convolution- and attention-based models in vision tasks, their rigid receptive fields and complex architectures limit their ability to model irregular spatial patterns and hinder interpretability, therefore posing challenges for tasks requiring high model transparency. Clustering paradigms offer promising interpretability and flexible semantic modeling, but suffer from limited accuracy, low efficiency, and gradient vanishing during training. To address these issues, we propose CLUster attEntion Network (CLUENet), an transparent deep architecture for visual semantic understanding. We propose three key innovations include (i) a Global Soft Aggregation and Hard Assignment with a Temperature-Scaled Cosin Attention and gated residual connections for enhanced local modeling, (ii) inter-block Hard and Shared Feature Dispatching, and (iii) an improved cluster pooling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Visual Attention and Saliency Detection
