SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models
K{\i}van\c{c} Kuzey Dikici, Serdar Kara, Semih \c{C}a\u{g}lar, Eray T\"uz\"un, Sinem Sav

TL;DR
SERSEM is a novel white-box attack framework that enhances membership inference in code language models by focusing on human-centric coding anomalies using static analysis and internal activations.
Contribution
It introduces SERSEM, a new method combining static AST analysis and transformer activation pooling to improve membership inference attacks on code LLMs.
Findings
SERSEM achieves an AUC-ROC of 0.7913 on StarCoder2-3B.
SERSEM outperforms probability-based baselines in membership inference.
Focusing on coding anomalies yields more robust memorization indicators.
Abstract
As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination through Membership Inference Attacks (MIAs) has become critical. We propose SERSEM (Selective Entropy-Weighted Scoring for Membership Inference), a novel white-box attack framework that suppresses uninformative syntactical boilerplate to amplify specific memorization signals. SERSEM utilizes a dual-signal methodology: first, a continuous character-level weight mask is derived through static Abstract Syntax Tree (AST) analysis, spellchecking-based multilingual logic detection, and offline linting. Second, these heuristic weights are used to pool internal transformer activations and calibrate token-level Z-scores from the output logits. Evaluated on a 25,000-sample balanced dataset, SERSEM achieves a global AUC-ROC of 0.7913 on the StarCoder2-3B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
