Machine Text Detectors are Membership Inference Attacks

Ryuto Koike; Liam Dugan; Masahiro Kaneko; Chris Callison-Burch; Naoaki Okazaki

arXiv:2510.19492·cs.CL·February 11, 2026

Machine Text Detectors are Membership Inference Attacks

Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki

PDF

Open Access 3 Reviews

TL;DR

This paper explores the theoretical and empirical connections between membership inference attacks and machine-generated text detection, revealing transferability of methods and introducing a unified evaluation suite.

Contribution

It demonstrates the transferability between MIAs and text detection, unifies methods under an optimal metric, and introduces MINT for cross-task evaluation.

Findings

01

Strong correlation in cross-task performance (ρ ≈ 0.7)

02

Machine text detectors perform well on both tasks

03

Unified evaluation suite MINT implemented with 15 methods

Abstract

Although membership inference attacks (MIAs) and machine-generated text detection target different goals, their methods often exploit similar signals based on a language model's probability distribution, and the two tasks have been studied independently. This can result in conclusions that overlook stronger methods and valuable insights from the other task. In this work, we theoretically and empirically demonstrate the transferability, i.e., how well a method originally developed for one task performs on the other, between MIAs and machine text detection. We prove that the metric achieving asymptotically optimal performance is identical for both tasks. We unify existing methods under this optimal metric and hypothesize that the accuracy with which a method approximates this metric is directly correlated with its transferability. Our large-scale empirical experiments demonstrate very…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

Equivalence and optimality proofs are always valuable in the context of algorithm development, given their ability to provide definitive and incontestable insights into the performance of different method classes. The authors' focus on Type I error, most concerning in the context of LLM detectors, in their proofs is a strong point, making this paper more relevant to the LLM detectors community.

Weaknesses

I am not convinced of the interest or contribution of this paper, at least not to a level that warrants acceptance to a conference such as ICLR. Specifically: - The author's theory hinges on the assumption that texts from its training dataset are generated by a model with a lower perplexity. While generally assumed, more recent results suggest that this might not be the case, with recent findings [1] indicating that even low-perplexity LLM-generated sequences do not map directly to the trainin

Reviewer 02Rating 6Confidence 4

Strengths

1. The theoretical section presents evidence for the target similarity of the two tasks. 2. The experiments are comprehensive. 3. It is surprising to find MGT detection methods perform pretty well on MIA tasks.

Weaknesses

1. The experimental setting is not perfectly aligned for the two tasks. For example, the MIA generators are all Pythia models, but Pythia is not within the MGT generators. If there is any reason for this, it should be mentioned in the paper. 2. The theoretical section mainly suggests that the absolute performance is correlated, but the presented major results are on rank correlation, which weakens the findings. One reason for this might be the unaligned experiment setting. 3. Lack of explanati

Reviewer 03Rating 2Confidence 4

Strengths

There is a useful (and I think new) conclusion that Binoculars (developed for machine generated text detection) appears to outperform all of the specialised techniques for membership inference attacks on the problem of membership inference. The writing is very clear, I am not very familiar with the literature on membership inference attacks but had no problem understanding the points that the authors were making.

Weaknesses

Essentially all of the complaints below boil down to not feeling that the analysis is sufficiently in depth. 1) The choice of baselines for machine generated text detection appears incomplete, and nearly all are focused on the same essential strategy (comparing how likely a machine finds the text, either through log-likelihood or log-rank). I believe all of the techniques apart from Lastde take log-likelihood as their essential measure. What about other strategies taking a fundamentally differ

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Authorship Attribution and Profiling