Utilization of machine learning for the detection of self-admitted   vulnerabilities

Moritz Mock

arXiv:2309.15619·cs.SE·December 5, 2023

Utilization of machine learning for the detection of self-admitted vulnerabilities

Moritz Mock

PDF

Open Access

TL;DR

This paper explores using machine learning and NLP techniques to automatically detect self-admitted vulnerabilities in source code comments, aiming to streamline vulnerability identification in software development.

Contribution

It introduces a novel approach combining NLP and NL-PL methods to identify SATD-related vulnerabilities and proposes a CI/CD pipeline for practical vulnerability detection.

Findings

01

Effective identification of SATD in source code comments

02

Enhanced understanding of vulnerability semantics in comments

03

Proposed pipeline facilitates practical vulnerability detection

Abstract

Motivation: Technical debt is a metaphor that describes not-quite-right code introduced for short-term needs. Developers are aware of it and admit it in source code comments, which is called Self- Admitted Technical Debt (SATD). Therefore, SATD indicates weak code that developers are aware of. Problem statement: Inspecting source code is time-consuming; automatically inspecting source code for its vulnerabilities is a crucial aspect of developing software. It helps practitioners reduce the time-consuming process and focus on vulnerable aspects of the source code. Proposal: Accurately identify and better understand the semantics of self-admitted technical debt (SATD) by leveraging NLP and NL-PL approaches to detect vulnerabilities and the related SATD. Finally, a CI/CD pipeline will be proposed to make the vulnerability discovery process easily accessible to practitioners.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Reliability and Analysis Research