Automating API Documentation from Crowdsourced Knowledge
Bonan Kou, Zijie Zhou, Muhao Chen, Tianyi Zhang

TL;DR
AutoDoc leverages online discussions and advanced language models to generate more accurate, comprehensive, and less redundant API documentation, addressing issues of obsolescence and incompleteness in official docs.
Contribution
The paper introduces AutoDoc, a novel approach combining dense retrieval and GPT-4-based summarization to improve API documentation quality from crowdsourced knowledge.
Findings
AutoDoc produces API documents up to 77.7% more accurate.
It reduces redundancy by 9.5%.
Generated documents uncover 34.4% more knowledge than official docs.
Abstract
API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Wikis in Education and Collaboration
