End to End Software Engineering Research

Idan Amit

arXiv:2112.11858·cs.SE·December 23, 2021·1 cites

End to End Software Engineering Research

Idan Amit

PDF

Open Access 1 Repo

TL;DR

This paper introduces an end-to-end machine learning framework for software engineering that predicts process metrics directly from source code, using a large dataset to improve defect and quality prediction without domain-specific features.

Contribution

It presents a novel end-to-end approach in software engineering, along with a large dataset enabling direct prediction and cause analysis of software metrics.

Findings

01

Effective prediction of defects and code quality from raw source code.

02

A dataset of 5 million files from 15,000 projects for research.

03

Enhanced knowledge extraction without domain expertise.

Abstract

End to end learning is machine learning starting in raw data and predicting a desired concept, with all steps done automatically. In software engineering context, we see it as starting from the source code and predicting process metrics. This framework can be used for predicting defects, code quality, productivity and more. End-to-end improves over features based machine learning by not requiring domain experts and being able to extract new knowledge. We describe a dataset of 5M files from 15k projects constructed for this goal. The dataset is constructed in a way that enables not only predicting concepts but also investigating their causes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evidencebp/e2ese
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability