hyperdoc2vec: Distributed Representations of Hypertext Documents
Jialong Han, Yan Song, Wayne Xin Zhao, Shuming Shi, Haisong Zhang

TL;DR
Hyperdoc2vec is a novel embedding method designed for hypertext documents that preserves essential information and outperforms existing models in tasks like paper classification and citation recommendation.
Contribution
The paper introduces hyperdoc2vec, a new hyper-document embedding approach with four criteria, and demonstrates its superiority over existing methods through systematic evaluation.
Findings
Hyperdoc2vec outperforms competitors in paper classification.
Hyperdoc2vec improves citation recommendation accuracy.
The model effectively preserves critical hyper-document information.
Abstract
Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss if directly adapted to hyper-documents. In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve. Systematic comparisons are conducted between hyperdoc2vec and several competitors on two tasks, i.e., paper classification and citation recommendation, in the academic paper domain. Analyses and experiments both validate the superiority of hyperdoc2vec to other models w.r.t. the four criteria.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
