QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration
Shaola Ren, Li Ke, Longtao Huang, Dehong Gao, Hui Xue

TL;DR
QExplorer leverages large language models with a two-stage training process to improve query extraction for toxic content exploration, outperforming humans and other LLMs in offline tests and increasing toxic item detection online.
Contribution
This paper introduces QExplorer, a novel LLM-based query extraction method with a unique training process and real-world dataset construction for toxic content exploration.
Findings
Outperforms several LLMs and humans in offline query extraction tasks.
Significantly increases detection of toxic items in online deployment.
Demonstrates effectiveness through offline and online experiments.
Abstract
Automatically extracting effective queries is challenging in information retrieval, especially in toxic content exploration, as such content is likely to be disguised. With the recent achievements in generative Large Language Model (LLM), we are able to leverage the capabilities of LLMs to extract effective queries for similar content exploration directly. This study proposes QExplorer, an approach of large language model based Query Extraction for toxic content Exploration. The QExplorer approach involves a 2-stage training process: instruction Supervised FineTuning (SFT) and preference alignment using Direct Preference Optimization (DPO), as well as the datasets construction with feedback of search system. To verify the effectiveness of QExplorer, a series of offline and online experiments are conducted on our real-world system. The offline empirical results demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
