Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects
Sebastian Baltes, Stephan Diehl

TL;DR
This study empirically investigates how often Stack Overflow code snippets are used in GitHub projects without proper attribution, revealing low compliance with licensing requirements and limited developer awareness.
Contribution
It provides the first large-scale empirical analysis of SO code snippet usage and attribution in GitHub projects, combining multiple methods and surveys.
Findings
Only 3.3% to 11.9% of projects reference SO
At most 1.8% of repositories use SO code compatibly with CC BY-SA 3.0
Approximately 25% of copied snippets are properly attributed
Abstract
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO's license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO's license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub (GH) projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
