TL;DR
This paper introduces a domain-adaptive detection method for AI-generated texts using a fine-tuned RoBERTa-Ranker, outperforming existing tools across multiple domains with minimal labeled data.
Contribution
It proposes a novel fine-tuning approach for RoBERTa-Ranker that enhances cross-domain detection of AI-generated texts with limited labeled data.
Findings
Outperforms DetectGPT and GPTZero in cross-domain detection
Requires only small labeled datasets for effective fine-tuning
Enables a single system to detect AI texts across various domains
Abstract
Existing tools to detect text generated by a large language model (LLM) have met with certain success, but their performance can drop when dealing with texts in new domains. To tackle this issue, we train a ranking classifier called RoBERTa-Ranker, a modified version of RoBERTa, as a baseline model using a dataset we constructed that includes a wider variety of texts written by humans and generated by various LLMs. We then present a method to fine-tune RoBERTa-Ranker that requires only a small amount of labeled data in a new domain. Experiments show that this fine-tuned domain-aware model outperforms the popular DetectGPT and GPTZero on both in-domain and cross-domain texts, where AI-generated texts may either be in a different domain or generated by a different LLM not used to generate the training datasets. This approach makes it feasible and economical to build a single system to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Dense Connections · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Weight Decay · Adam · Attention Dropout
