Generating Examples From CLI Usage: Can Transformers Help?
Roshanak Zilouchian Moghaddam, Spandan Garg, Colin B. Clement, Yevhen, Mohylevskyy, Neel Sundaresan

TL;DR
This paper presents a machine learning system that generates up-to-date software examples from telemetry data, improving documentation accuracy and reducing manual effort in maintaining software documentation, tested on Azure CLI.
Contribution
The paper introduces a practical system combining feature-based and transformer-based ML approaches to generate software examples, achieving full coverage and significant documentation effort reduction.
Findings
Achieves 100% coverage of functionalities in Azure CLI
Reduces documentation PRs by over 68%
Operates effectively in a production environment for 3 years
Abstract
Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to regularly turn to other resources on the web such as StackOverflow for examples to guide them in writing software. We recognize that this inconvenient, error-prone, and expensive process can be improved by using machine learning applied to software usage data. In this paper, we present our practical system which uses machine learning on large-scale telemetry data and documentation corpora, generating appropriate and complex examples that can be used to improve documentation. We discuss both feature-based and transformer-based machine learning approaches and demonstrate that our system achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Scientific Computing and Data Management
