Detecting Underspecification in Software Requirements via k-NN Coverage Geometry
Wenyan Yang, Tom\'a\v{s} Janovec, Samantha Bavautdin

TL;DR
geogap{} is a geometric approach that detects missing requirement types in software specifications by analyzing requirement coverage using pretrained sentence embeddings and k-nearest-neighbour distances, achieving high accuracy.
Contribution
It introduces a novel geometric method combining multiple scoring components to identify underspecified requirement types without requiring manual annotation.
Findings
Achieves 0.935 AUROC on PROMISE NFR benchmark.
Effectively detects completely absent requirement types.
Each component of the pipeline contributes measurable value.
Abstract
We propose \geogap{}, a geometric method for detecting missing requirement types in software specifications. The method represents each requirement as a unit vector via a pretrained sentence encoder, then measures coverage deficits through -nearest-neighbour distances z-scored against per-project baselines. Three complementary scoring components -- per-point geometric coverage, type-restricted distributional coverage, and annotation-free population counting -- fuse into a unified gap score controlled by two hyperparameters. On the PROMISE NFR benchmark, \geogap{} achieves 0.935 AUROC for detecting completely absent requirement types in projects with requirements, matching a ground-truth count oracle that requires human annotation. Six baselines confirm that each pipeline component -- per-project normalisation, neural embeddings, and geometric scoring -- contributes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Reliability and Analysis Research
