Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Ruyi Ding, Tong Zhou, Lili Su, Aidong Adam Ding, Xiaolin Xu, Yunsi Fei

TL;DR
This paper introduces EncoderLock, a method to protect pre-trained encoders from malicious probing by restricting their use on prohibited domains while maintaining utility on authorized tasks, enhancing responsible AI practices.
Contribution
EncoderLock is a novel applicability authorization technique that employs domain-aware weight selection and a self-challenging training scheme to defend against malicious probing of pre-trained encoders.
Findings
EncoderLock effectively restricts encoder use on prohibited domains.
It maintains high performance on authorized domains.
Proven on a real-world Vision Transformer (ViT) encoder.
Abstract
Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers to cope with limited computational resources and data volume. More specifically, probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning, which helps to prevent overfitting and catastrophic forgetting. However, such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions, such as discriminatory speculation and warfare applications. In this work, we introduce EncoderLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones. Achieving this balance is challenging because of the opposite optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
