Quality Classifiers for Open Source Software Repositories

George Tsatsaronis; Maria Halkidi; Emmanouel A. Giakoumakis

arXiv:0904.4708·cs.SE·May 1, 2009·2 cites

Quality Classifiers for Open Source Software Repositories

George Tsatsaronis, Maria Halkidi, Emmanouel A. Giakoumakis

PDF

Open Access

TL;DR

This paper presents a data mining approach that trains classifiers on OSS repository meta-data to predict project success, aiding in identifying promising open source projects for further development.

Contribution

It introduces a novel classifier training method using OSS meta-data to predict project success, which is a new application in open source software analysis.

Findings

01

Classifiers can effectively predict OSS project success.

02

Meta-data features are strong indicators of project continuation.

03

The approach aids in early identification of promising OSS projects.

Abstract

Open Source Software (OSS) often relies on large repositories, like SourceForge, for initial incubation. The OSS repositories offer a large variety of meta-data providing interesting information about projects and their success. In this paper we propose a data mining approach for training classifiers on the OSS meta-data provided by such data repositories. The classifiers learn to predict the successful continuation of an OSS project. The `successfulness' of projects is defined in terms of the classifier confidence with which it predicts that they could be ported in popular OSS projects (such as FreeBSD, Gentoo Portage).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Imbalanced Data Classification Techniques · Data Stream Mining Techniques