Beyond Dataset Watermarking: Model-Level Copyright Protection for Code Summarization Models
Jiale Zhang, Haoxuan Li, Di Wu, Xiaobing Sun, Qinghua Lu, Guodong Long

TL;DR
This paper introduces ModMark, a model-level digital watermarking method for code summarization models that enhances copyright protection across multiple programming languages while maintaining concealment and robustness.
Contribution
ModMark is a novel watermarking technique that uses tokenizer fine-tuning for cross-language generalization and code noise injection for concealment, addressing limitations of prior dataset and style-based methods.
Findings
Achieves 100% watermark verification rate across various languages.
Demonstrates robustness against automated detection.
Maintains concealment and effectiveness of watermarks.
Abstract
Code Summarization Model (CSM) has been widely used in code production, such as online and web programming for PHP and Javascript. CSMs are essential tools in code production, enhancing software development efficiency and driving innovation in automated code analysis. However, CSMs face risks of exploitation by unauthorized users, particularly in an online environment where CSMs can be easily shared and disseminated. To address these risks, digital watermarks offer a promising solution by embedding imperceptible signatures within the models to assert copyright ownership and track unauthorized usage. Traditional watermarking for CSM copyright protection faces two main challenges: 1) dataset watermarking methods require separate design of triggers and watermark features based on the characteristics of different programming languages, which not only increases the computation complexity but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Advanced Data Storage Technologies · Digital and Cyber Forensics
