TL;DR
GigaCheck introduces a dual-strategy framework combining document-level authorship detection with span-level localization by treating generated text segments as objects, leveraging vision models for improved detection robustness.
Contribution
The paper presents a novel approach that adapts visual object detection models for precise localization of AI-generated text spans, enhancing detection robustness and generalization.
Findings
High accuracy in authorship classification across multiple benchmarks.
Effective localization of generated text spans using DETR-like models.
Demonstrated robustness and generalization of the approach across tasks.
Abstract
With the increasing quality and spread of LLM assistants, the amount of generated content is growing rapidly. In many cases and tasks, such texts are already indistinguishable from those written by humans, and the quality of generation continues to increase. At the same time, detection methods are advancing more slowly than generation models, making it challenging to prevent misuse of generative AI technologies. We propose GigaCheck, a dual-strategy framework for AI-generated text detection. At the document level, we leverage the representation learning of fine-tuned LLMs to discern authorship with high data efficiency. At the span level, we introduce a novel structural adaptation that treats generated text segments as "objects." By integrating a DETR-like vision model with linguistic encoders, we achieve precise localization of AI intervals, effectively transferring the robustness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
