UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text using Transformer Ensembles
Andrei-Alexandru Preda, Dumitru-Clementin Cercel, Traian Rebedea,, Costin-Gabriel Chiru

TL;DR
This paper presents UPB's approach to detecting machine-generated text in bilingual datasets using Transformer ensembles and advanced training techniques, achieving around 67% F1-score.
Contribution
The paper introduces ensemble Transformer models with multi-task and adversarial training for improved machine-generated text detection across English and Spanish.
Findings
Achieved macro F1-scores of 66.63% (English) and 67.10% (Spanish)
Demonstrated effectiveness of ensemble and adversarial training techniques
Participated in IberLEF-2023 shared task with competitive results
Abstract
This paper describes the solutions submitted by the UPB team to the AuTexTification shared task, featured as part of IberLEF-2023. Our team participated in the first subtask, identifying text documents produced by large language models instead of humans. The organizers provided a bilingual dataset for this subtask, comprising English and Spanish texts covering multiple domains, such as legal texts, social media posts, and how-to articles. We experimented mostly with deep learning models based on Transformers, as well as training techniques such as multi-task learning and virtual adversarial training to obtain better results. We submitted three runs, two of which consisted of ensemble models. Our best-performing model achieved macro F1-scores of 66.63% on the English dataset and 67.10% on the Spanish dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
