System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection

Binglin Wu; Jiaxiu Zou; Xianneng Li

arXiv:2512.09563·cs.CL·December 11, 2025

System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection

Binglin Wu, Jiaxiu Zou, Xianneng Li

PDF

Open Access

TL;DR

This paper introduces a three-stage LLM-based framework for detecting fine-grained Chinese hate speech, combining prompt engineering, supervised fine-tuning, and model merging to improve robustness and accuracy on social media data.

Contribution

It presents a novel multi-stage approach that effectively captures implicit hate patterns and enhances domain adaptation for Chinese hate speech detection.

Findings

01

Outperforms baseline methods on STATE-ToxiCN benchmark

02

Improves robustness against out-of-distribution cases

03

Enhances detection of context-dependent hate speech

Abstract

The proliferation of hate speech on Chinese social media poses urgent societal risks, yet traditional systems struggle to decode context-dependent rhetorical strategies and evolving slang. To bridge this gap, we propose a novel three-stage LLM-based framework: Prompt Engineering, Supervised Fine-tuning, and LLM Merging. First, context-aware prompts are designed to guide LLMs in extracting implicit hate patterns. Next, task-specific features are integrated during supervised fine-tuning to enhance domain adaptation. Finally, merging fine-tuned LLMs improves robustness against out-of-distribution cases. Evaluations on the STATE-ToxiCN benchmark validate the framework's effectiveness, demonstrating superior performance over baseline methods in detecting fine-grained hate speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Emotion and Mood Recognition