DIALOG-22 RuATD Generated Text Detection
Narek Maloyan, Bulat Nutfullin, Eugene Ilyushin

TL;DR
This paper presents a detection pipeline for distinguishing TGM-generated text from human-written text and identifying the specific model used, achieving top accuracy in DIALOG-22 RuATD tasks using ensemble methods of pre-trained attention models.
Contribution
The paper introduces an ensemble approach of pre-trained attention models for TGM-generated text detection and classification, achieving top results in the DIALOG-22 RuATD challenge.
Findings
Achieved 1st place in binary detection with 82.995% accuracy.
Secured 4th place in multiclass classification with 62.856% accuracy.
Proposed an effective ensemble method based on attention mechanisms.
Abstract
Text Generation Models (TGMs) succeed in creating text that matches human language style reasonably well. Detectors that can distinguish between TGM-generated text and human-written ones play an important role in preventing abuse of TGM. In this paper, we describe our pipeline for the two DIALOG-22 RuATD tasks: detecting generated text (binary task) and classification of which model was used to generate text (multiclass task). We achieved 1st place on the binary classification task with an accuracy score of 0.82995 on the private test set and 4th place on the multiclass classification task with an accuracy score of 0.62856 on the private test set. We proposed an ensemble method of different pre-trained models based on the attention mechanism.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Authorship Attribution and Profiling
MethodsTest
