Automatic derivation of domain terms and concept location based on the analysis of the identifiers
Peter Vaclavik, Jaroslav Poruban, Marek Mezei

TL;DR
This paper introduces an automated method to extract domain-specific vocabulary from code identifiers and comments, aiding understanding and maintenance by identifying key concepts and their locations within software systems.
Contribution
It presents a novel approach for automatically deriving domain terms and locating concepts in code, demonstrated through experiments on various application server and web framework projects.
Findings
Domain vocabulary extraction improves code comprehension.
Automated concept location facilitates maintenance tasks.
Effective in multiple software projects.
Abstract
Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as "concepts" and their placement in the code as "concept location". Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in "classes" (OO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Web Data Mining and Analysis
