A Domain-Adapted Pipeline for Structured Information Extraction from Police Incident Announcements on Social Media
Mengfan Shen, Kangqi Song, Xindi Wang, Wei Jia, Tao Wang, Ziqiang Han

TL;DR
This paper presents a domain-adapted, prompt-engineered, and LoRA-finetuned language model pipeline for extracting structured information from noisy police social media posts, achieving high accuracy in key fields.
Contribution
It introduces a novel pipeline combining prompt engineering and LoRA fine-tuning of Qwen2.5-7B for effective information extraction from social media texts.
Findings
Achieved over 98.36% accuracy in mortality detection.
Attained 95.31% Exact Match for fatality counts.
Reached 95.54% Exact Match in location extraction.
Abstract
Structured information extraction from police incident announcements is crucial for timely and accurate data processing, yet presents considerable challenges due to the variability and informal nature of textual sources such as social media posts. To address these challenges, we developed a domain-adapted extraction pipeline that leverages targeted prompt engineering with parameter-efficient fine-tuning of the Qwen2.5-7B model using Low-Rank Adaptation (LoRA). This approach enables the model to handle noisy, heterogeneous text while reliably extracting 15 key fields, including location, event characteristics, and impact assessment, from a high-quality, manually annotated dataset of 4,933 instances derived from 27,822 police briefing posts on Chinese Weibo (2019-2020). Experimental results demonstrated that LoRA-based fine-tuning significantly improved performance over both the base and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Data-Driven Disease Surveillance
