TL;DR
This paper introduces RPT, a multi-task self-supervised pre-training model that effectively captures researcher data from heterogeneous sources, enabling transferability across multiple academic data mining tasks.
Contribution
The paper presents a novel hierarchical Transformer-based pre-training framework for researcher data, with two transfer modes for diverse downstream applications.
Findings
RPT improves performance on three downstream researcher data mining tasks.
Pre-training with RPT enhances transferability across different research scenarios.
Extensive experiments validate the effectiveness of the proposed model.
Abstract
With the growth of the academic engines, the mining and analysis acquisition of massive researcher data, such as collaborator recommendation and researcher retrieval, has become indispensable. It can improve the quality of services and intelligence of academic engines. Most of the existing studies for researcher data mining focus on a single task for a particular application scenario and learning a task-specific model, which is usually unable to transfer to out-of-scope tasks. The pre-training technology provides a generalized and sharing model to capture valuable information from enormous unlabeled data. The model can accomplish multiple downstream tasks via a few fine-tuning steps. In this paper, we propose a multi-task self-supervised learning-based researcher data pre-training model named RPT. Specifically, we divide the researchers' data into semantic document sets and community…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Softmax · Residual Connection · Adam · Label Smoothing · Byte Pair Encoding · Dropout
