Automating SBOM Generation with Zero-Shot Semantic Similarity
Devin Pereira, Christopher Molloy, Sudipta Acharya, Steven H.H. Ding

TL;DR
This paper introduces an automated SBOM generation method using zero-shot semantic similarity with transformer models, enhancing software supply-chain security by accurately identifying components without prior training on specific data.
Contribution
It presents a novel zero-shot approach leveraging transformer-based semantic similarity for automated SBOM generation, improving accuracy and consistency over traditional methods.
Findings
Strong zero-shot classification performance
Effective in static code analysis for component identification
Potential to enhance cybersecurity supply-chain defenses
Abstract
It is becoming increasingly important in the software industry, especially with the growing complexity of software ecosystems and the emphasis on security and compliance for manufacturers to inventory software used on their systems. A Software-Bill-of-Materials (SBOM) is a comprehensive inventory detailing a software application's components and dependencies. Current approaches rely on case-based reasoning to inconsistently identify the software components embedded in binary files. We propose a different route, an automated method for generating SBOMs to prevent disastrous supply-chain attacks. Remaining on the topic of static code analysis, we interpret this problem as a semantic similarity task wherein a transformer model can be trained to relate a product name to corresponding version strings. Our test results are compelling, demonstrating the model's strong performance in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
