Enhancing API Documentation through BERTopic Modeling and Summarization
AmirHossein Naghshzan, Sylvie Ratte

TL;DR
This paper introduces a novel method combining BERTopic and NLP techniques to automatically generate summaries and identify key topics in API documentation, improving accessibility and comprehension for developers.
Contribution
It presents a new approach that leverages BERTopic modeling and summarization to enhance API documentation analysis and developer understanding.
Findings
Effective topic extraction from API docs
Improved summaries for complex documentation
Enhanced developer information retrieval
Abstract
As the amount of textual data in various fields, including software development, continues to grow, there is a pressing demand for efficient and effective extraction and presentation of meaningful insights. This paper presents a unique approach to address this need, focusing on the complexities of interpreting Application Programming Interface (API) documentation. While official API documentation serves as a primary source of information for developers, it can often be extensive and lacks user-friendliness. In light of this, developers frequently resort to unofficial sources like Stack Overflow and GitHub. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation, thereby creating a more efficient method for developers to extract the information they need. The produced summaries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Data Mining and Analysis · Software System Performance and Reliability
