Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method
Parvez Mahbub, Naz Zarreen Oishie, S M Rafizul Haque

TL;DR
This paper introduces a stacking ensemble classifier combining neural networks, random forests, and SVMs to identify multiple authors of source code segments, significantly improving accuracy over existing single-author methods.
Contribution
It presents a novel ensemble approach capable of accurately identifying multiple authors in source code, addressing a gap in existing single-author focused techniques.
Findings
Enhanced accuracy in multi-author source code attribution
Outperforms existing single-author identification methods
Effective in complex authorship scenarios
Abstract
Source code segment authorship identification is the task of identifying the author of a source code segment through supervised learning. It has vast importance in plagiarism detection, digital forensics, and several other law enforcement issues. However, when a source code segment is written by multiple authors, typical author identification methods no longer work. Here, an author identification technique, capable of predicting the authorship of source code segments, even in the case of multiple authors, has been proposed which uses a stacking ensemble classifier. This proposed technique is built upon several deep neural networks, random forests and support vector machine classifiers. It has been shown that for identifying the author group, a single classification technique is no longer sufficient and using a deep neural network-based stacking ensemble method can enhance the accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
