CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling
Ling Zhan, Tao Jia

TL;DR
This paper introduces CoarSAS2hvec, a novel HIN embedding method that uses balanced sampling and coarsening to better capture network information, outperforming existing methods in multiple tasks.
Contribution
The paper proposes CoarSAS2hvec, a new HIN embedding approach that addresses sampling imbalance with a coarsening procedure and optimized loss function, improving representation quality.
Findings
Outperforms nine other methods on four real-world datasets.
Samples collected by CoarSAS contain richer information with higher entropy.
Traditional loss functions applied to CoarSAS samples yield better results.
Abstract
Heterogeneous information network (HIN) embedding aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are wildly adopted applies random walk to generate a sequence of heterogeneous context, from which the embedding is learned. However, due to the multipartite graph structure of HIN, hub nodes tend to be over-represented in the sampled sequence, giving rise to imbalanced samples of the network. Here we propose a new embedding method CoarSAS2hvec. The self-avoid short sequence sampling with the HIN coarsening procedure (CoarSAS) is utilized to better collect the rich information in HIN. An optimized loss function is used to improve the performance of the HIN structure embedding. CoarSAS2hvec outperforms nine other methods in two different tasks on four real-world data sets. The ablation study confirms that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
