Minimizing Mismatch Risk: A Prototype-Based Routing Framework for Zero-shot LLM-generated Text Detection
Ke Sun, Guangsheng Bao, Han Cui, Yue Zhang

TL;DR
This paper introduces DetectRouter, a prototype-based routing framework that dynamically selects the most suitable surrogate model for zero-shot LLM-generated text detection, significantly improving robustness and accuracy.
Contribution
It proposes a novel routing approach that learns to match surrogates to inputs, addressing the variability in detection performance due to surrogate-source alignment.
Findings
DetectRouter outperforms fixed surrogate methods on EvoBench and MAGE benchmarks.
The framework achieves consistent improvements across multiple detection criteria.
It effectively generalizes from white-box to black-box models through geometric alignment.
Abstract
Zero-shot methods detect LLM-generated text by computing statistical signatures using a surrogate model. Existing approaches typically employ a fixed surrogate for all inputs regardless of the unknown source. We systematically examine this design and find that detection performance varies substantially depending on surrogate-source alignment. We observe that while no single surrogate achieves optimal performance universally, a well-matched surrogate typically exists within a diverse pool for any given input. This finding transforms robust detection into a routing problem: selecting the most appropriate surrogate for each input. We propose DetectRouter, a prototype-based framework that learns text-detector affinity through two-stage training. The first stage constructs discriminative prototypes from white-box models; the second generalizes to black-box sources by aligning geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Handwritten Text Recognition Techniques
