Model Attribution in LLM-Generated Disinformation: A Domain   Generalization Approach with Supervised Contrastive Learning

Alimohammad Beigi; Zhen Tan; Nivedh Mudiam; Canyu Chen; Kai Shu and; Huan Liu

arXiv:2407.21264·cs.CL·August 15, 2024

Model Attribution in LLM-Generated Disinformation: A Domain Generalization Approach with Supervised Contrastive Learning

Alimohammad Beigi, Zhen Tan, Nivedh Mudiam, Canyu Chen, Kai Shu and, Huan Liu

PDF

Open Access

TL;DR

This paper presents a domain generalization approach using supervised contrastive learning to improve model attribution of LLM-generated disinformation across diverse prompting methods and models.

Contribution

It introduces a novel domain generalization framework for model attribution in disinformation detection, leveraging supervised contrastive learning to handle prompt diversity.

Findings

01

Achieves state-of-the-art performance in attribution accuracy

02

Robustly handles unseen datasets and prompt variations

03

Effective across multiple LLMs and prompting methods

Abstract

Model attribution for LLM-generated disinformation poses a significant challenge in understanding its origins and mitigating its spread. This task is especially challenging because modern large language models (LLMs) produce disinformation with human-like quality. Additionally, the diversity in prompting methods used to generate disinformation complicates accurate source attribution. These methods introduce domain-specific features that can mask the fundamental characteristics of the models. In this paper, we introduce the concept of model attribution as a domain generalization problem, where each prompting method represents a unique domain. We argue that an effective attribution model must be invariant to these domain-specific features. It should also be proficient in identifying the originating models across all scenarios, reflecting real-world detection challenges. To address this,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications

MethodsContrastive Learning