Detecting Vulnerabilities from Issue Reports for Internet-of-Things
Sogol Masoumzadeh

TL;DR
This paper explores machine learning and NLP techniques to detect software vulnerabilities in IoT issue reports, introducing novel approaches and fine-tuning models for improved classification accuracy.
Contribution
It is the first to apply ML and LLMs specifically to IoT vulnerability detection from issue reports, including fine-tuning BERT on IoT data.
Findings
SVM with BERT features achieved AUC of 0.65.
Fine-tuned BERT accuracy was 0.26, highlighting data exposure importance.
Proposed methods set a foundation for IoT vulnerability detection from issue reports.
Abstract
Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Software Engineering Research · Advanced Malware Detection Techniques
