Advances in Pre-trained Language Models for Domain-Specific Text Classification: A Systematic Review
Zhyar Rzgar K. Rostam, G\'abor Kert\'esz

TL;DR
This systematic review analyzes how pre-trained language models like BERT variants are used for domain-specific text classification, highlighting recent advancements, challenges, and future research directions across various specialized fields.
Contribution
The paper provides a comprehensive taxonomy of techniques, compares model performances in domain-specific tasks, and offers insights into the evolution and challenges of PLMs in specialized NLP applications.
Findings
Transformer-based models outperform traditional methods in domain-specific classification.
BioBERT and SciBERT show superior performance in biomedical tasks.
Challenges include vocabulary mismatch and data imbalance in specialized domains.
Abstract
The exponential increase in scientific literature and online information necessitates efficient methods for extracting knowledge from textual data. Natural language processing (NLP) plays a crucial role in addressing this challenge, particularly in text classification tasks. While large language models (LLMs) have achieved remarkable success in NLP, their accuracy can suffer in domain-specific contexts due to specialized vocabulary, unique grammatical structures, and imbalanced data distributions. In this systematic literature review (SLR), we investigate the utilization of pre-trained language models (PLMs) for domain-specific text classification. We systematically review 41 articles published between 2018 and January 2024, adhering to the PRISMA statement (preferred reporting items for systematic reviews and meta-analyses). This review methodology involved rigorous inclusion criteria…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
