J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News
Tharindu Kumarage, Amrita Bhattacharjee, Djordje Padejski, Kristy, Roschke, Dan Gillmor, Scott Ruston, Huan Liu, Joshua Garland

TL;DR
J-Guard is a novel framework that enhances the detection of AI-generated news by leveraging journalistic stylistic cues, significantly improving robustness against adversarial attacks while maintaining high detection accuracy.
Contribution
This paper introduces J-Guard, a new method that guides existing detectors with journalistic stylistic features to improve adversarial robustness in AI-generated news detection.
Findings
J-Guard improves detection accuracy against AI-generated news.
J-Guard maintains detection performance with only 7% decrease under adversarial attacks.
Effective across multiple AI models including ChatGPT (GPT3.5).
Abstract
The rapid proliferation of AI-generated text online is profoundly reshaping the information landscape. Among various types of AI-generated text, AI-generated news presents a significant threat as it can be a prominent source of misinformation online. While several recent efforts have focused on detecting AI-generated text in general, these methods require enhanced reliability, given concerns about their vulnerability to simple adversarial attacks. Furthermore, due to the eccentricities of news writing, applying these detection methods for AI-generated news can produce false positives, potentially damaging the reputation of news organizations. To address these challenges, we leverage the expertise of an interdisciplinary team to develop a framework, J-Guard, capable of steering existing supervised AI text detectors for detecting AI-generated news while boosting adversarial robustness. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection
