E-SC4R: Explaining Software Clustering for Remodularisation
Alvin Jian Jia Tan, Chun Yong Chong, Aldeida Aleti

TL;DR
This paper introduces a framework to evaluate and explain the effectiveness of hierarchical and Bunch clustering algorithms in software remodularisation, aiding in selecting suitable methods based on software features.
Contribution
It proposes a new approach to assess and explain software clustering algorithms' suitability across diverse software systems, enhancing external validity and understanding.
Findings
Characterises strengths and weaknesses of clustering algorithms using software features.
Demonstrates the framework on 30 open source systems with varying sizes and domains.
Uses dimensionality reduction to improve understanding of algorithm behaviour.
Abstract
Maintenance of existing software requires a large amount of time for comprehending the source code. The architecture of a software, however, may not be clear to maintainers if up to date documentations are not available. Software clustering is often used as a remodularisation and architecture recovery technique to help recover a semantic representation of the software design. Due to the diverse domains, structure, and behaviour of software systems, the suitability of different clustering algorithms for different software systems are not investigated thoroughly. Research that introduce new clustering techniques usually validate their approaches on a specific domain, which might limit its generalisability. If the chosen test subjects could only represent a narrow perspective of the whole picture, researchers might risk not being able to address the external validity of their findings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Software Engineering Methodologies
