SsciBERT: A Pre-trained Language Model for Social Science Texts
Si Shen, Jiangfeng Liu, Litao Lin, Ying Huang, Lin Zhang, Chang Liu,, Yutong Feng, Dongbo Wang

TL;DR
SsciBERT is a specialized pre-trained language model designed for social science texts, enhancing tasks like classification and entity recognition, and filling a gap in domain-specific NLP tools for social sciences.
Contribution
The paper introduces SsciBERT, the first pre-trained language model specifically for social science literature, improving NLP task performance in this domain.
Findings
Excellent performance on discipline classification
Effective in abstract structure-function recognition
High accuracy in named entity recognition
Abstract
The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks. However, the pre-trained language model for social sciences is not available so far. In light of this, the present research proposes a pre-trained model based on the abstracts published in the Social Science Citation Index (SSCI) journals. The models, which are available on GitHub (https://github.com/S-T-Full-Text-Knowledge-Mining/SSCI-BERT), show excellent performance on discipline classification, abstract structure-function recognition, and named entity recognition tasks with the social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
