Toxic Code Snippets on Stack Overflow
Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe, Bianco, Rocco Oliveto

TL;DR
This study investigates the prevalence and risks of toxic code snippets on Stack Overflow, revealing issues like outdated, buggy, and license-violating code through large-scale clone detection and surveys.
Contribution
It provides the first large-scale analysis of online code clones on Stack Overflow, highlighting toxicity issues and potential license violations with empirical evidence.
Findings
66% of clones from open source are outdated
10 clones were buggy and harmful for reuse
214 snippets may violate original licenses
Abstract
Online code clones are code fragments that are copied from software projects or online sources to Stack Overflow as examples. Due to an absence of a checking mechanism after the code has been copied to Stack Overflow, they can become toxic code snippets, e.g., they suffer from being outdated or violating the original software license. We present a study of online code clones on Stack Overflow and their toxicity by incorporating two developer surveys and a large-scale code clone detection. A survey of 201 high-reputation Stack Overflow answerers (33% response rate) showed that 131 participants (65%) have ever been notified of outdated code and 26 of them (20%) rarely or never fix the code. 138 answerers (69%) never check for licensing conflicts between their copied code snippets and Stack Overflow's CC BY-SA 3.0. A survey of 87 Stack Overflow visitors shows that they experienced several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
