UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text   using Transformer Ensembles

Andrei-Alexandru Preda; Dumitru-Clementin Cercel; Traian Rebedea,; Costin-Gabriel Chiru

arXiv:2308.01408·cs.CL·August 4, 2023

UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text using Transformer Ensembles

Andrei-Alexandru Preda, Dumitru-Clementin Cercel, Traian Rebedea,, Costin-Gabriel Chiru

PDF

Open Access

TL;DR

This paper presents UPB's approach to detecting machine-generated text in bilingual datasets using Transformer ensembles and advanced training techniques, achieving around 67% F1-score.

Contribution

The paper introduces ensemble Transformer models with multi-task and adversarial training for improved machine-generated text detection across English and Spanish.

Findings

01

Achieved macro F1-scores of 66.63% (English) and 67.10% (Spanish)

02

Demonstrated effectiveness of ensemble and adversarial training techniques

03

Participated in IberLEF-2023 shared task with competitive results

Abstract

This paper describes the solutions submitted by the UPB team to the AuTexTification shared task, featured as part of IberLEF-2023. Our team participated in the first subtask, identifying text documents produced by large language models instead of humans. The organizers provided a bilingual dataset for this subtask, comprising English and Spanish texts covering multiple domains, such as legal texts, social media posts, and how-to articles. We experimented mostly with deep learning models based on Transformers, as well as training techniques such as multi-task learning and virtual adversarial training to obtain better results. We submitted three runs, two of which consisted of ensemble models. Our best-performing model achieved macro F1-scores of 66.63% on the English dataset and 67.10% on the Spanish dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification