Classifier or Prompt: A Case Study on Legal Requirements Traceability
Romina Etezadi, Sallam Abualhaija, Chetan Arora, Lionel Briand

TL;DR
This paper compares automated language model-based solutions for legal requirements traceability, showing that tailored prompts and classifiers significantly improve link detection accuracy over existing methods.
Contribution
It introduces and empirically evaluates two novel approaches, Kashif and RICE_LRT, for legal requirements traceability, demonstrating their superior performance over baseline models.
Findings
Kashif achieves 63% F2 score, outperforming baselines by 21 pp.
RICE_LRT reaches 84% recall and 61% F2 score on GDPR-related documents.
Legal artifact-specific techniques outperform general literature methods.
Abstract
New regulations are introduced to ensure software development aligns with ethical concerns and protects public safety. Showing compliance requires tracing requirements to legal provisions. Requirements traceability is a key task where engineers must analyze technical requirements against target artifacts, often within limited time. Manually analyzing complex systems with hundreds of requirements is infeasible. The legal dimension adds challenges that increase effort. In this paper, we investigate two automated solutions based on language models, including large ones (LLMs). The first solution, Kashif, is a classifier that leverages sentence transformers and semantic similarity. The second solution, RICE_LRT, prompts a recent LLM based on RICE, a prompt engineering framework. Using a publicly available benchmark dataset, we empirically evaluate Kashif and compare it against seven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
