DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code

Shriyansh Agrawal; Aidan Lau; Sanyam Shah; Ahan M R; Kevin Zhu; Sunishchal Dev; Vasu Sharma

arXiv:2510.18904·cs.CL·October 23, 2025

DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code

Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma

PDF

Open Access

TL;DR

DuoLens introduces a fine-tuned Small Language Model framework that significantly improves the accuracy and efficiency of detecting machine-generated multilingual text and code, outperforming larger models with less computational cost.

Contribution

The paper demonstrates that fine-tuned encoder-only SLMs like RoBERTA and CodeBERTa outperform LLMs in binary classification of machine-generated content, with higher accuracy and lower resource usage.

Findings

01

Achieves AUROC of 0.97-0.99 and macro-F1 of 0.89-0.94.

02

Reduces latency by 8-12 times and VRAM by 3-5 times.

03

Maintains over 92% AUROC under adversarial transformations.

Abstract

The prevalence of Large Language Models (LLMs) for generating multilingual text and source code has only increased the imperative for machine-generated content detectors to be accurate and efficient across domains. Current detectors, predominantly utilizing zero-shot methods, such as Fast DetectGPT or GPTZero, either incur high computational cost or lack sufficient accuracy, often with a trade-off between the two, leaving room for further improvement. To address these gaps, we propose the fine-tuning of encoder-only Small Language Models (SLMs), in particular, the pre-trained models of RoBERTA and CodeBERTa using specialized datasets on source code and other natural language to prove that for the task of binary classification, SLMs outperform LLMs by a huge margin whilst using a fraction of compute. Our encoders achieve AUROC $= 0.97$ to $0.99$ and macro-F1 $0.89$ to $0.94$ while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning