Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses
Jukka Ruohonen, Ville Lepp\"anen

TL;DR
This study evaluates textual information retrieval methods for linking software vulnerabilities to weaknesses, finding that explicit identifier referencing outperforms these techniques in consistency and accuracy.
Contribution
It provides a preliminary validation of retrieval techniques for vulnerability-weakness mapping and highlights the superiority of explicit identifier referencing over general IR methods.
Findings
IR techniques perform worse than regex searches
Explicit referencing yields more consistent results
Further validation needed for precision improvement
Abstract
This paper presents a preliminary validation of common textual information retrieval techniques for mapping unstructured software vulnerability information to distinct software weaknesses. The validation is carried out with a dataset compiled from four software repositories tracked in the Snyk vulnerability database. According to the results, the information retrieval techniques used perform unsatisfactorily compared to regular expression searches. Although the results vary from a repository to another, the preliminary validation presented indicates that explicit referencing of vulnerability and weakness identifiers is preferable for concrete vulnerability tracking. Such referencing allows the use of keyword-based searches, which currently seem to yield more consistent results compared to information retrieval techniques. Further validation work is required for improving the precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Web Application Security Vulnerabilities
