SsciBERT: A Pre-trained Language Model for Social Science Texts

Si Shen; Jiangfeng Liu; Litao Lin; Ying Huang; Lin Zhang; Chang Liu,; Yutong Feng; Dongbo Wang

arXiv:2206.04510·cs.CL·November 28, 2022·Scientometrics

SsciBERT: A Pre-trained Language Model for Social Science Texts

Si Shen, Jiangfeng Liu, Litao Lin, Ying Huang, Lin Zhang, Chang Liu,, Yutong Feng, Dongbo Wang

PDF

Open Access 1 Repo

TL;DR

SsciBERT is a specialized pre-trained language model designed for social science texts, enhancing tasks like classification and entity recognition, and filling a gap in domain-specific NLP tools for social sciences.

Contribution

The paper introduces SsciBERT, the first pre-trained language model specifically for social science literature, improving NLP task performance in this domain.

Findings

01

Excellent performance on discipline classification

02

Effective in abstract structure-function recognition

03

High accuracy in named entity recognition

Abstract

The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks. However, the pre-trained language model for social sciences is not available so far. In light of this, the present research proposes a pre-trained model based on the abstracts published in the Social Science Citation Index (SSCI) journals. The models, which are available on GitHub (https://github.com/S-T-Full-Text-Knowledge-Mining/SSCI-BERT), show excellent performance on discipline classification, abstract structure-function recognition, and named entity recognition tasks with the social…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

s-t-full-text-knowledge-mining/ssci-bert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods