How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects?
Juntong Chen, Yan Zhao, Na Meng

TL;DR
This study empirically investigates how Java developers reuse StackOverflow answers in GitHub projects, revealing patterns in answer characteristics, reuse practices, and providing insights for improving answer quality.
Contribution
It introduces a hybrid method combining clone detection, keyword search, and manual inspection to identify reused SO answers in GitHub Java projects, offering new insights into reuse behaviors.
Findings
Most reused answers provide code for specific tasks.
Reused answers tend to have higher scores and older ages.
Only 9% of code snippets are fully copied; most are partial or rewritten.
Abstract
StackOverflow (SO) is a widely used question-and-answer (Q\&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. Prior work relates the information mined from both platforms to link user accounts or compare developers' activities across platforms. However, not much work is done to characterize the SO answers reused by GitHub projects. For this paper, we did an empirical study by mining the SO answers reused by Java projects available on GitHub. We created a hybrid approach of clone detection, keyword-based search, and manual inspection, to identify the answer(s) actually leveraged by developers. Based on the identified answers, we further studied topics of the discussion threads, answer characteristics (e.g., scores, ages, code lengths, and text lengths), and developers'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
