DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma

TL;DR
DuoLens introduces a fine-tuned Small Language Model framework that significantly improves the accuracy and efficiency of detecting machine-generated multilingual text and code, outperforming larger models with less computational cost.
Contribution
The paper demonstrates that fine-tuned encoder-only SLMs like RoBERTA and CodeBERTa outperform LLMs in binary classification of machine-generated content, with higher accuracy and lower resource usage.
Findings
Achieves AUROC of 0.97-0.99 and macro-F1 of 0.89-0.94.
Reduces latency by 8-12 times and VRAM by 3-5 times.
Maintains over 92% AUROC under adversarial transformations.
Abstract
The prevalence of Large Language Models (LLMs) for generating multilingual text and source code has only increased the imperative for machine-generated content detectors to be accurate and efficient across domains. Current detectors, predominantly utilizing zero-shot methods, such as Fast DetectGPT or GPTZero, either incur high computational cost or lack sufficient accuracy, often with a trade-off between the two, leaving room for further improvement. To address these gaps, we propose the fine-tuning of encoder-only Small Language Models (SLMs), in particular, the pre-trained models of RoBERTA and CodeBERTa using specialized datasets on source code and other natural language to prove that for the task of binary classification, SLMs outperform LLMs by a huge margin whilst using a fraction of compute. Our encoders achieve AUROC to and macro-F1 to while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
