End to End Software Engineering Research
Idan Amit

TL;DR
This paper introduces an end-to-end machine learning framework for software engineering that predicts process metrics directly from source code, using a large dataset to improve defect and quality prediction without domain-specific features.
Contribution
It presents a novel end-to-end approach in software engineering, along with a large dataset enabling direct prediction and cause analysis of software metrics.
Findings
Effective prediction of defects and code quality from raw source code.
A dataset of 5 million files from 15,000 projects for research.
Enhanced knowledge extraction without domain expertise.
Abstract
End to end learning is machine learning starting in raw data and predicting a desired concept, with all steps done automatically. In software engineering context, we see it as starting from the source code and predicting process metrics. This framework can be used for predicting defects, code quality, productivity and more. End-to-end improves over features based machine learning by not requiring domain experts and being able to extract new knowledge. We describe a dataset of 5M files from 15k projects constructed for this goal. The dataset is constructed in a way that enables not only predicting concepts but also investigating their causes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
