Semantic Labeling for Third-Party Cybersecurity Risk Assessment: A Semi-Supervised Approach to Intent-Aware Question Retrieval
Ali Nour Eldin, Mohamed Sellami, Mehdi Acheli, Walid Gaaloul, Julien Steunou

TL;DR
This paper introduces a semi-supervised, intent-aware question retrieval method for cybersecurity risk assessment that improves efficiency and alignment with assessment scope by using semantic labels instead of direct question similarity.
Contribution
It proposes a novel approach that combines label discovery, large-scale label assignment, and label propagation to enhance question retrieval in cybersecurity assessments.
Findings
Reduces labeling cost and runtime compared to LLM-based annotation.
Achieves better alignment with cybersecurity control domains.
Maintains label quality while improving retrieval efficiency.
Abstract
Third-Party Risk Assessment (TPRA) relies on large repositories of cybersecurity compliance questions used to assess external suppliers against standards such as ISO/IEC 27001 and NIST. In practice, not all questions are relevant for a specific supplier and selecting questions for a given assessment context remains a manual and time-consuming task. Existing question retrieval approaches based on lexical or semantic similarity can identify topically related questions, but they often fail to capture the underlying assessment intent, including control domain and evaluation scope. To address this limitation, we investigate whether an explicit semantic label space can improve intent-aware TPRA question selection. In particular, we separate label space discovery from large-scale label assignment. We start by discovering overlapping clusters of semantically similar questions and then exploit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
