SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine
Hoang-Quoc Nguyen-Son, Minh-Son Dao, Koji Zettsu

TL;DR
SearchLLM is a novel method that uses search engine capabilities to detect LLM-paraphrased text by comparing similarities with regenerated candidate sources, improving detection accuracy and robustness against paraphrasing attacks.
Contribution
It introduces SearchLLM, a new approach that enhances existing detectors by leveraging search engines to identify LLM paraphrasing through source similarity analysis.
Findings
SearchLLM improves detection accuracy across various LLMs.
It enhances existing detectors' ability to identify closely mimicked paraphrased text.
SearchLLM helps prevent paraphrasing attacks.
Abstract
With the advent of large language models (LLMs), it has become common practice for users to draft text and utilize LLMs to enhance its quality through paraphrasing. However, this process can sometimes result in the loss or distortion of the original intended meaning. Due to the human-like quality of LLM-generated text, traditional detection methods often fail, particularly when text is paraphrased to closely mimic original content. In response to these challenges, we propose a novel approach named SearchLLM, designed to identify LLM-paraphrased text by leveraging search engine capabilities to locate potential original text sources. By analyzing similarities between the input and regenerated versions of candidate sources, SearchLLM effectively distinguishes LLM-paraphrased content. SearchLLM is designed as a proxy layer, allowing seamless integration with existing detectors to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Spam and Phishing Detection
