Interactive Duplicate Search in Software Documentation
D.V. Luciv, D.V. Koznov, A.A. Shelikhovskii, K.Yu. Romanovsky, G.A., Chernishev, A.N. Terekhov, D.A. Grigoriev, A.N. Smirnova, D.V. Borovkov, A.I., Vasenina

TL;DR
This paper introduces an interactive method for detecting duplicates in software documentation, combining formal definitions, pattern-based techniques, and user involvement to improve documentation maintenance and reuse.
Contribution
It presents a novel interactive duplicate detection approach with a new formal definition and proof of completeness, validated on industrial project documents.
Findings
Effective duplicate detection in software documentation demonstrated
Interactive process improves search relevance and accuracy
Validated on multiple industrial project collections
Abstract
Various software features such as classes, methods, requirements, and tests often have similar functionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled duplicates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of duplicate detection in software documentation is of importance. Solving it makes planned reuse possible, as well as creating and using templates for unification and automatic generation of documentation. In this paper, we present an interactive process for duplicate detection that involves the user in order to conduct meaningful search. It includes a new formal definition of a near duplicate, a pattern-based, and the proof of its completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of several industrial projects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Data Mining and Analysis
