Evaluating Transfer Learning for Simplifying GitHub READMEs
Haoyu Gao, Christoph Treude, Mansooreh Zahedi

TL;DR
This paper investigates the use of transfer learning with Transformer models to automatically simplify GitHub README files, improving comprehension by leveraging general text simplification data.
Contribution
It introduces a transfer learning approach that combines general-domain and software-specific data to enhance automatic README simplification.
Findings
Transfer learning improves BLEU scores in README simplification.
The best model outperformed baselines in human evaluations.
Transfer learning helps mitigate data scarcity and style drift issues.
Abstract
Software documentation captures detailed knowledge about a software product, e.g., code, technologies, and design. It plays an important role in the coordination of development teams and in conveying ideas to various stakeholders. However, software documentation can be hard to comprehend if it is written with jargon and complicated sentence structure. In this study, we explored the potential of text simplification techniques in the domain of software engineering to automatically simplify GitHub README files. We collected software-related pairs of GitHub README files consisting of 14,588 entries, aligned difficult sentences with their simplified counterparts, and trained a Transformer-based model to automatically simplify difficult versions. To mitigate the sparse and noisy nature of the software-related simplification dataset, we applied general text simplification knowledge to this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
