QExplorer: Large Language Model Based Query Extraction for Toxic Content   Exploration

Shaola Ren; Li Ke; Longtao Huang; Dehong Gao; Hui Xue

arXiv:2502.18480·cs.IR·February 27, 2025

QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration

Shaola Ren, Li Ke, Longtao Huang, Dehong Gao, Hui Xue

PDF

Open Access

TL;DR

QExplorer leverages large language models with a two-stage training process to improve query extraction for toxic content exploration, outperforming humans and other LLMs in offline tests and increasing toxic item detection online.

Contribution

This paper introduces QExplorer, a novel LLM-based query extraction method with a unique training process and real-world dataset construction for toxic content exploration.

Findings

01

Outperforms several LLMs and humans in offline query extraction tasks.

02

Significantly increases detection of toxic items in online deployment.

03

Demonstrates effectiveness through offline and online experiments.

Abstract

Automatically extracting effective queries is challenging in information retrieval, especially in toxic content exploration, as such content is likely to be disguised. With the recent achievements in generative Large Language Model (LLM), we are able to leverage the capabilities of LLMs to extract effective queries for similar content exploration directly. This study proposes QExplorer, an approach of large language model based Query Extraction for toxic content Exploration. The QExplorer approach involves a 2-stage training process: instruction Supervised FineTuning (SFT) and preference alignment using Direct Preference Optimization (DPO), as well as the datasets construction with feedback of search system. To verify the effectiveness of QExplorer, a series of offline and online experiments are conducted on our real-world system. The offline empirical results demonstrate that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques