
TL;DR
This paper introduces a lightweight, model-agnostic LoRA-based framework that detects backdoors and membership inference in neural networks by analyzing low-rank adaptation dynamics without needing access to original training data.
Contribution
It presents a novel LoRA-based oracle method that identifies malicious or membership-related samples through low-rank updates, avoiding retraining or assumptions about attack mechanisms.
Findings
Distinguishes poisoned and member samples via low-rank signals
Operates without access to original training data
Works across different models and attack types
Abstract
Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce a novel LoRA-based oracle framework that leverages low-rank adaptation modules as a lightweight, model-agnostic probe for both backdoor detection and membership inference. Our approach attaches task-specific LoRA adapters to a frozen backbone and analyzes their optimization dynamics and representation shifts when exposed to suspicious samples. We show that poisoned and member samples induce distinctive low-rank updates that differ significantly from those generated by clean or non-member data. These signals can be measured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Privacy-Preserving Technologies in Data
