How Open Should Open Source Be?
Adam Barth, Saung Li, Benjamin I. P. Rubinstein, Dawn Song

TL;DR
This paper demonstrates that open-source projects' patch metadata can be exploited to identify security fixes before release, suggesting that keeping patches secret until release enhances security.
Contribution
The study reveals vulnerabilities in open-source patch metadata and proposes that delaying public disclosure of security patches improves security.
Findings
Security patches can be identified for most of Firefox's development period using metadata.
Machine learning extends vulnerability window by analyzing patch author and other metadata.
Obfuscating metadata alone is insufficient; keeping patches secret until release is more effective.
Abstract
Many open-source projects land security fixes in public repositories before shipping these patches to users. This paper presents attacks on such projects - taking Firefox as a case-study - that exploit patch metadata to efficiently search for security patches prior to shipping. Using access-restricted bug reports linked from patch descriptions, security patches can be immediately identified for 260 out of 300 days of Firefox 3 development. In response to Mozilla obfuscating descriptions, we show that machine learning can exploit metadata such as patch author to search for security patches, extending the total window of vulnerability by 5 months in an 8 month period when examining up to two patches daily. Finally we present strong evidence that further metadata obfuscation is unlikely to prevent information leaks, and we argue that open-source projects instead ought to keep security…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Spam and Phishing Detection · Adversarial Robustness in Machine Learning
