StackOverflow vs Kaggle: A Study of Developer Discussions About Data Science
David Hin

TL;DR
This study compares developer discussions on StackOverflow and Kaggle, revealing differences in focus, topics, and trends in data science, machine learning, and deep learning discussions across these platforms.
Contribution
It provides a large-scale analysis of 197,836 posts using topic modeling to identify key discussion themes and differences between the communities.
Findings
TensorFlow topics are most prevalent on StackOverflow.
Kaggle discussions focus more on meta and practical aspects.
Interest in Keras is rising while TensorFlow discussions slow down.
Abstract
Software developers are increasingly required to understand fundamental Data science (DS) concepts. Recently, the presence of machine learning (ML) and deep learning (DL) has dramatically increased in the development of user applications, whether they are leveraged through frameworks or implemented from scratch. These topics attract much discussion on online platforms. This paper conducts large-scale qualitative and quantitative experiments to study the characteristics of 197836 posts from StackOverflow and Kaggle. Latent Dirichlet Allocation topic modelling is used to extract twenty-four DS discussion topics. The main findings include that TensorFlow-related topics were most prevalent in StackOverflow, while meta discussion topics were the prevalent ones on Kaggle. StackOverflow tends to include lower-level troubleshooting, while Kaggle focuses on practicality and optimising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Online Learning and Analytics
