Unsupervised Learning via Network-Aware Embeddings
Anne Sophie Riis Damstrup, Sofie Tosti Madsen, Michele Coscia

TL;DR
This paper introduces a novel method for creating network-aware embeddings of node attributes to improve clustering in complex, interconnected data, demonstrating benefits across various real-world applications.
Contribution
It proposes a new approach that generates network-aware embeddings of node attributes, addressing a gap in existing clustering methods by explicitly incorporating network structure.
Findings
Network embeddings improve clustering performance.
Method scales to large networks.
Provides actionable insights in diverse fields.
Abstract
Data clustering, the task of grouping observations according to their similarity, is a key component of unsupervised learning -- with real world applications in diverse fields such as biology, medicine, and social science. Often in these fields the data comes with complex interdependencies between the dimensions of analysis, for instance the various characteristics and opinions people can have live on a complex social network. Current clustering methods are ill-suited to tackle this complexity: deep learning can approximate these dependencies, but not take their explicit map as the input of the analysis. In this paper, we aim at fixing this blind spot in the unsupervised learning literature. We can create network-aware embeddings by estimating the network distance between numeric node attributes via the generalized Euclidean distance. Differently from all methods in the literature that…
Peer Reviews
Decision·Submitted to ICLR 2024
This paper proposes a pipeline to reduce the dimensions of node features.
1. The biggest problem with this paper is that the model innovation is not enough. In short, the authors propose a distance metric that considers graph structure based on the scheme of Coscia et al. (2020) and apply it to t-SNE to reduce feature dimensions, thereby achieving the goal of clustering node attributes. From my perspective, this does not meet the standard of an academic paper that will be presented at a prestigious conference. 2. The expression in this paper is not concise enough. It
The paper appears to propose a novel way to generate clusters through embeddings.
The main idea in itself is hardly novel, as embedding-based clustering has been around for a long time. In the experimental study, the paper compares to preliminary embedding methods like node2vec, which have been superseded by other more recent works in the literature. Thus it is hard to judge the success of the method in experimental terms. More importantly, it is hard to judge this paper because the problem definition is written in a manner that is hard to make sense of. It is claimed that t
(1) The paper's main strength is the novelty of the problem setting. I am not aware of any other approach to clustering that attempts to use an explicit graph over the dimensions. This is an interesting setting and worth exploring. (2) The paper's simulation study and real data studies are (each) thorough, well-described, and appropriate for evaluation.
I will elaborate on these high-level weaknesses (with evidence) in my questions to the authors. (1) The paper lacks a clear understanding of how each of the individual modules are used. In some places, there is reason to question the soundness of the modules that are proposed. (2) The proposed pipeline has unclear efficacy on the real-world datasets. It is not clear to practitioners whether the proposed approach should be used in general, or whether the baselines would suffice. (3) The author
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Data Stream Mining Techniques
