Image as Set of Points
Xu Ma, Yuqian Zhou, Huan Wang, Can Qin, Bin Sun, Chang Liu, Yun Fu

TL;DR
This paper introduces Context Clusters, a novel image representation paradigm that treats images as sets of points and uses clustering for feature extraction, offering interpretability and competitive performance without convolution or attention mechanisms.
Contribution
The paper proposes Context Clusters, a simple clustering-based method for visual representation that is convolution- and attention-free, providing new insights and broad applicability.
Findings
Achieves comparable or better results than ConvNets and ViTs on benchmarks.
Provides interpretable visualization of clustering process.
Offers a new perspective on image representation using unorganized points.
Abstract
What is an image and how to extract latent features? Convolutional Networks (ConvNets) consider an image as organized pixels in a rectangular shape and extract features via convolutional operation in local region; Vision Transformers (ViTs) treat an image as a sequence of patches and extract features via attention mechanism in a global range. In this work, we introduce a straightforward and promising paradigm for visual representation, which is called Context Clusters. Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm. In detail, each point includes the raw feature (e.g., color) and positional information (e.g., coordinates), and a simplified clustering algorithm is employed to group and extract deep features hierarchically. Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection
