Fine-Tuned Language Models for Domain-Specific Summarization and Tagging
Jun Wang, Fuming Lin, Yuyu Chen

TL;DR
This paper introduces a pipeline combining fine-tuned large language models with named entity recognition to improve domain-specific text summarization and tagging, especially in evolving sub-cultural languages and slang.
Contribution
It demonstrates that instruction fine-tuning enhances summarization and tagging accuracy across general and domain-specific datasets, with transferability of reasoning capabilities across languages.
Findings
Instruction fine-tuning improves accuracy significantly.
Domain-specific fine-tuning outperforms general models.
Models effectively support real-time information management.
Abstract
This paper presents a pipeline integrating fine-tuned large language models (LLMs) with named entity recognition (NER) for efficient domain-specific text summarization and tagging. The authors address the challenge posed by rapidly evolving sub-cultural languages and slang, which complicate automated information extraction and law enforcement monitoring. By leveraging the LLaMA Factory framework, the study fine-tunes LLMs on both generalpurpose and custom domain-specific datasets, particularly in the political and security domains. The models are evaluated using BLEU and ROUGE metrics, demonstrating that instruction fine-tuning significantly enhances summarization and tagging accuracy, especially for specialized corpora. Notably, the LLaMA3-8B-Instruct model, despite its initial limitations in Chinese comprehension, outperforms its Chinese-trained counterpart after domainspecific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
