SIExVulTS: Sensitive Information Exposure Vulnerability Detection System using Transformer Models and Static Analysis

Kyler Katz; Sara Moshtari; Ibrahim Mujhid; Mehdi Mirakhorli; Derek Garcia

arXiv:2508.19472·cs.CR·August 28, 2025

SIExVulTS: Sensitive Information Exposure Vulnerability Detection System using Transformer Models and Static Analysis

Kyler Katz, Sara Moshtari, Ibrahim Mujhid, Mehdi Mirakhorli, Derek Garcia

PDF

TL;DR

SIExVulTS is a novel system combining transformer models and static analysis to detect and verify sensitive information exposure vulnerabilities in Java applications, outperforming existing tools and discovering new CVEs.

Contribution

This paper introduces SIExVulTS, integrating transformer-based models with static analysis for comprehensive CWE-200 vulnerability detection and verification in Java code.

Findings

01

Attack Surface Detection achieved >93% F1 score

02

Flow Verification increased precision from 22.61% to 87.23%

03

Successfully uncovered six new CVEs in Apache projects

Abstract

Sensitive Information Exposure (SIEx) vulnerabilities (CWE-200) remain a persistent and under-addressed threat across software systems, often leading to serious security breaches. Existing detection tools rarely target the diverse subcategories of CWE-200 or provide context-aware analysis of code-level data flows. Aims: This paper aims to present SIExVulTS, a novel vulnerability detection system that integrates transformer-based models with static analysis to identify and verify sensitive information exposure in Java applications. Method: SIExVulTS employs a three-stage architecture: (1) an Attack Surface Detection Engine that uses sentence embeddings to identify sensitive variables, strings, comments, and sinks; (2) an Exposure Analysis Engine that instantiates CodeQL queries aligned with the CWE-200 hierarchy; and (3) a Flow Verification Engine that leverages GraphCodeBERT to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.