SOCluster- Towards Intent-based Clustering of Stack Overflow Questions using Graph-Based Approach
Abhishek Kumar, Deep Ghadiyali, Sridhar Chimalakonda

TL;DR
This paper introduces SOCluster, a graph-based clustering tool that groups Stack Overflow questions by intent to improve question answering efficiency and reduce unanswered questions.
Contribution
The paper presents a novel intent-based clustering approach for Stack Overflow questions using a graph-based method and evaluates it across multiple datasets with promising results.
Findings
Optimal clustering at 90% similarity threshold across datasets
Clusters show meaningful grouping based on intent
Evaluation metrics indicate good clustering quality
Abstract
Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising. This issue is common across various community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. Clustering is one of the approaches used by these communities to address this challenge. Specifically, Intent-based clustering could be leveraged to answer unanswered questions using other answered questions in the same cluster and can also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions based on intent using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed intent-based clustering on them. We have done a preliminary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Text and Document Classification Technologies · Software Engineering Research
