DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution
L. D. M. S. Sai Teja, N. Siva Gopala Krishna, Ufaq Khan, Muhammad Haris Khan, Atul Mishra

TL;DR
This paper presents Info-Mask, a novel framework for detecting and segmenting mixed human-AI authored texts, incorporating stylometric cues, adversarial robustness, and human-interpretable attributions to improve trust and oversight.
Contribution
We introduce Info-Mask, a new method combining stylometric, perplexity, and boundary modeling for accurate mixed-authorship segmentation, along with an adversarial benchmark dataset MAS.
Findings
Info-Mask improves segmentation robustness against adversarial attacks.
The framework provides human-interpretable attributions for boundary decisions.
Our system establishes new performance baselines in mixed-authorship detection.
Abstract
In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-authorship text, that is identifying transition points in text where authorship shifts from human to AI or vice-versa, a problem with critical implications for authenticity, trust, and human oversight. We introduce a novel framework, called Info-Mask for mixed authorship detection that integrates stylometric cues, perplexity-driven signals, and structured boundary modeling to accurately segment collaborative human-AI content. To evaluate the robustness of our system against adversarial perturbations, we construct and release an adversarial benchmark dataset Mixed-text Adversarial setting for Segmentation (MAS), designed to probe the limits of existing detectors. Beyond segmentation accuracy, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
