Programming Language Co-Usage Patterns on Stack Overflow: Analysis of the Developer Ecosystem
Bachan Ghimire, Nitin Gupta

TL;DR
This paper analyzes developer programming language co-usage on Stack Overflow, revealing core language clusters, developer profiles, and macro-communities through a multi-method empirical approach.
Contribution
It introduces a novel three-phase pipeline combining frequent itemset mining, topic modeling, and community detection to characterize the software ecosystem from behavioral data.
Findings
Identifies tight language clusters like shell/bash and Swift/Objective-C.
Discovers 25 developer profiles including Apple-platform and scientific programmers.
Partitions languages into web/enterprise, Apple, and scientific communities with Java as a hub.
Abstract
Understanding how developers combine programming languages in practice reveals the hidden structure of the software ecosystem: which languages are used as complements, which define coherent technology stacks, and which bridge disparate communities. We present a three-phase empirical pipeline that mines Stack Overflow posts by hundreds of thousands of developers across 186 programming languages, applying FP-Growth frequent itemset mining, Latent Dirichlet Allocation topic modeling, and Louvain community detection on a weighted co-usage graph, with the goal of characterizing co-usage coupling, latent developer specializations, and macro-level ecosystem structure simultaneously from behavioral data. FP-Growth identifies tight coupling clusters such as shell/bash, Swift/Objective-C, and the C-family with lift values far exceeding what individual language popularity predicts. LDA produces 25…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
