LLM-based Content Classification Approach for GitHub Repositories by the README Files

Malik Uzair Mehmood; Shahid Hussain; Wen Li Wang; Muhammad Usama Malik

arXiv:2507.21899·cs.AI·July 30, 2025

LLM-based Content Classification Approach for GitHub Repositories by the README Files

Malik Uzair Mehmood, Shahid Hussain, Wen Li Wang, Muhammad Usama Malik

PDF

TL;DR

This paper develops an LLM-based method to automatically classify sections of GitHub README files, significantly improving accuracy and efficiency, and explores parameter-efficient fine-tuning techniques as economical alternatives.

Contribution

It introduces a fine-tuning approach for LLMs to classify README sections, outperforming existing methods and incorporating PEFT techniques like LoRA for cost-effective training.

Findings

01

Achieved an F1 score of 0.98 in classification accuracy.

02

Demonstrated the effectiveness of PEFT techniques like LoRA.

03

Outperformed current state-of-the-art methods.

Abstract

GitHub is the world's most popular platform for storing, sharing, and managing code. Every GitHub repository has a README file associated with it. The README files should contain project-related information as per the recommendations of GitHub to support the usage and improvement of repositories. However, GitHub repository owners sometimes neglected these recommendations. This prevents a GitHub repository from reaching its full potential. This research posits that the comprehensiveness of a GitHub repository's README file significantly influences its adoption and utilization, with a lack of detail potentially hindering its full potential for widespread engagement and impact within the research community. Large Language Models (LLMs) have shown great performance in many text-based tasks including text classification, text generation, text summarization and text translation. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.