Monolith Development History for Microservices Identification: a Comparative Analysis
Jo\~ao Louren\c{c}o, Ant\'onio Rito Silva

TL;DR
This study compares different monolith codebase representations for microservices identification, finding that combining access sequences with development history yields the best modularity and team reduction results across diverse codebases.
Contribution
It introduces a comparative analysis of monolith representations based on code access sequences and development history for microservices identification.
Findings
Combining access sequences and development history improves decomposition quality.
Authorship-based representation performs well in multi-author codebases.
Different representations are effective depending on codebase characteristics.
Abstract
Recent research has proposed different approaches on the automated identification of candidate microservices on monolith systems, which vary on the monolith representation, similarity criteria, and quality metrics used. On the other hand, they are generally limited in the number of codebases and decompositions evaluated, and few comparisons between approaches exist. Considering the emerging trend in software engineering in techniques based on the analysis of codebases' evolution, we compare a representation based on the monolith code structure, in particular the sequences of accesses to domain entities, with representations based on the monolith development history (file changes and changes authorship). From the analysis on a total of 468k decompositions of 28 codebases, using five quality metrics that evaluate modularity, minimization of the number of transactions per functionality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Data Quality and Management · Cloud Computing and Resource Management
