Document Network Embedding: Coping for Missing Content and Missing Links

Jean Dupuy; Adrien Guille; Julien Jacques

arXiv:1912.03048·cs.IR·December 9, 2019

Document Network Embedding: Coping for Missing Content and Missing Links

Jean Dupuy, Adrien Guille, Julien Jacques

PDF

Open Access

TL;DR

This paper introduces a method to estimate missing document or link information in document networks by learning a linear transformation between content and node representations, improving retrieval and link prediction.

Contribution

It proposes a novel linear transformation approach, inspired by machine translation, to handle missing content or links in document network embeddings.

Findings

01

Enhanced prediction of unobserved node neighborhoods.

02

Improved retrieval of similar documents with missing content.

03

Efficient estimation of missing representations using SVD.

Abstract

Searching through networks of documents is an important task. A promising path to improve the performance of information retrieval systems in this context is to leverage dense node and content representations learned with embedding techniques. However, these techniques cannot learn representations for documents that are either isolated or whose content is missing. To tackle this issue, assuming that the topology of the network and the content of the documents correlate, we propose to estimate the missing node representations from the available content representations, and conversely. Inspired by recent advances in machine translation, we detail in this paper how to learn a linear transformation from a set of aligned content and node representations. The projection matrix is efficiently calculated in terms of the singular value decomposition. The usefulness of the proposed method is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies