Coherence-Based Distributed Document Representation Learning for Scientific Documents
Shicheng Tan, Shu Zhao, Yanping Zhang

TL;DR
This paper introduces a novel document embedding method that emphasizes the coherence of scientific documents by using coupled text pairs, improving tasks like information retrieval and recommendation.
Contribution
The paper proposes the CTPE model that incorporates document coherence through coupled text pairs, a novel approach for scientific document representation.
Findings
The CTPE model outperforms baseline methods in information retrieval.
The model improves recommendation task accuracy.
Experimental results validate the effectiveness of coherence-based embeddings.
Abstract
Distributed document representation is one of the basic problems in natural language processing. Currently distributed document representation methods mainly consider the context information of words or sentences. These methods do not take into account the coherence of the document as a whole, e.g., a relation between the paper title and abstract, headline and description, or adjacent bodies in the document. The coherence shows whether a document is meaningful, both logically and syntactically, especially in scientific documents (papers or patents, etc.). In this paper, we propose a coupled text pair embedding (CTPE) model to learn the representation of scientific documents, which maintains the coherence of the document with coupled text pairs formed by segmenting the document. First, we divide the document into two parts (e.g., title and abstract, etc) which construct a coupled text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques
