SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For   Pre-trained Models

Jiangyi Deng (1); Shengyuan Pang (1); Yanjiao Chen (1); Liangming Xia; (1); Yijie Bai (1); Haiqin Weng (2); Wenyuan Xu (1) ((1) Zhejiang University,; (2) Ant Group)

arXiv:2404.12699·cs.LG·April 22, 2024·2 cites

SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models

Jiangyi Deng (1), Shengyuan Pang (1), Yanjiao Chen (1), Liangming Xia, (1), Yijie Bai (1), Haiqin Weng (2), Wenyuan Xu (1) ((1) Zhejiang University,, (2) Ant Group)

PDF

Open Access 1 Repo

TL;DR

This paper introduces SOPHON, a novel framework that makes pre-trained models resistant to fine-tuning for restricted, potentially unethical tasks, while maintaining their original performance, enhancing AI safety and responsibility.

Contribution

SOPHON is the first learning paradigm to prevent fine-tuning on restricted domains without sacrificing original task performance, using model-agnostic meta-learning techniques.

Findings

01

SOPHON effectively resists fine-tuning in restricted domains.

02

Fine-tuning SOPHON models incurs comparable or higher overhead than training from scratch.

03

SOPHON demonstrates robustness across multiple models, domains, and optimization strategies.

Abstract

Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains. Nonetheless, this is challenging due to a diversity of complicated fine-tuning strategies that may be adopted by adversaries. Inspired by model-agnostic meta-learning, we overcome this difficulty by designing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chiange/sophon
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · COVID-19 diagnosis using AI · Speech Recognition and Synthesis