Authorship Identification of Source Code Segments Written by Multiple   Authors Using Stacking Ensemble Method

Parvez Mahbub; Naz Zarreen Oishie; S M Rafizul Haque

arXiv:2212.05610·cs.SE·December 13, 2022

Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Parvez Mahbub, Naz Zarreen Oishie, S M Rafizul Haque

PDF

TL;DR

This paper introduces a stacking ensemble classifier combining neural networks, random forests, and SVMs to identify multiple authors of source code segments, significantly improving accuracy over existing single-author methods.

Contribution

It presents a novel ensemble approach capable of accurately identifying multiple authors in source code, addressing a gap in existing single-author focused techniques.

Findings

01

Enhanced accuracy in multi-author source code attribution

02

Outperforms existing single-author identification methods

03

Effective in complex authorship scenarios

Abstract

Source code segment authorship identification is the task of identifying the author of a source code segment through supervised learning. It has vast importance in plagiarism detection, digital forensics, and several other law enforcement issues. However, when a source code segment is written by multiple authors, typical author identification methods no longer work. Here, an author identification technique, capable of predicting the authorship of source code segments, even in the case of multiple authors, has been proposed which uses a stacking ensemble classifier. This proposed technique is built upon several deep neural networks, random forests and support vector machine classifiers. It has been shown that for identifying the author group, a single classification technique is no longer sufficient and using a deep neural network-based stacking ensemble method can enhance the accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.