Towards Bridging Language Gaps in OSS with LLM-Driven Documentation Translation
Elijah Kayode Adejumo, Mariam Guizani, Fatemeh Vares, Brittany Johnson

TL;DR
This paper evaluates the potential of large language models like ChatGPT 4 and Anthropic Claude to translate open source technical documentation, highlighting their strengths in semantic accuracy and challenges in structural preservation.
Contribution
It provides an empirical assessment of LLMs for translating OSS documentation, revealing their capabilities and limitations in real-world scenarios.
Findings
LLMs can produce accurate semantic translations of technical content.
Translation activity is mainly community-driven and concentrated in larger repositories.
LLMs struggle to preserve document structure and technical formatting.
Abstract
While open source communities attract diverse contributors across the globe, only a few open source software repositories provide essential documentation, such as ReadMe or CONTRIBUTING files, in languages other than English. Recently, large language models (LLMs) have demonstrated remarkable capabilities in a variety of software engineering tasks. We have also seen advances in the use of LLMs for translations in other domains and contexts. Despite this progress, little is known regarding the capabilities of LLMs in translating open-source technical documentation, which is often a mixture of natural language, code, URLs, and markdown formatting. To better understand the need and potential for LLMs to support translation of technical documentation in open source, we conducted an empirical evaluation of translation activity and translation capabilities of two powerful large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Artificial Intelligence in Healthcare and Education · Scientific Computing and Data Management
