TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with   Diverse Writing Styles

Adaku Uchendu; Thai Le; Dongwon Lee

arXiv:2309.12934·cs.CL·October 3, 2024·5 cites

TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

Adaku Uchendu, Thai Le, Dongwon Lee

PDF

Open Access 1 Repo

TL;DR

TopFormer is a novel topology-aware transformer model that enhances authorship attribution of deepfake texts by integrating topological data analysis, significantly improving detection accuracy across diverse datasets.

Contribution

The paper introduces TopFormer, a transformer model with a TDA layer that captures linguistic structures, advancing deepfake authorship attribution methods.

Findings

01

TopFormer outperforms baseline models with up to 7% higher Macro F1 score.

02

TDA features improve performance on imbalanced and multi-style datasets.

03

Incorporating TDA enhances the model's ability to detect diverse deepfake texts.

Abstract

Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as deepfake texts. There are currently over 72K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and dis/misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as Authorship Attribution (AA), in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose TopFormer to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adauchendu/topformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Layer Normalization · WordPiece · Dropout · Dense Connections · Linear Layer · Softmax · Linear Warmup With Linear Decay