Evaluating Transfer Learning for Simplifying GitHub READMEs

Haoyu Gao; Christoph Treude; Mansooreh Zahedi

arXiv:2308.09940·cs.SE·August 22, 2023

Evaluating Transfer Learning for Simplifying GitHub READMEs

Haoyu Gao, Christoph Treude, Mansooreh Zahedi

PDF

TL;DR

This paper investigates the use of transfer learning with Transformer models to automatically simplify GitHub README files, improving comprehension by leveraging general text simplification data.

Contribution

It introduces a transfer learning approach that combines general-domain and software-specific data to enhance automatic README simplification.

Findings

01

Transfer learning improves BLEU scores in README simplification.

02

The best model outperformed baselines in human evaluations.

03

Transfer learning helps mitigate data scarcity and style drift issues.

Abstract

Software documentation captures detailed knowledge about a software product, e.g., code, technologies, and design. It plays an important role in the coordination of development teams and in conveying ideas to various stakeholders. However, software documentation can be hard to comprehend if it is written with jargon and complicated sentence structure. In this study, we explored the potential of text simplification techniques in the domain of software engineering to automatically simplify GitHub README files. We collected software-related pairs of GitHub README files consisting of 14,588 entries, aligned difficult sentences with their simplified counterparts, and trained a Transformer-based model to automatically simplify difficult versions. To mitigate the sparse and noisy nature of the software-related simplification dataset, we applied general text simplification knowledge to this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.