Deterministic Fuzzy Triage for Legal Compliance Classification and Evidence Retrieval
Rian Atri

TL;DR
This paper introduces a deterministic, transparent approach for legal evidence classification using dual encoders and fuzzy triage bands, achieving high accuracy and explainability in compliance tasks.
Contribution
It presents a reproducible method combining deterministic dual encoders and fuzzy thresholds for legal compliance classification, enhancing transparency and auditability.
Findings
Achieved high retrieval performance with NDCG@5 up to 0.42
Attained AUC of 0.98-0.99 on compliance classification
Mapped scalar scores into actionable compliance regions
Abstract
Legal teams increasingly use machine learning to triage large volumes of contractual evidence, but many models are opaque, non-deterministic, and difficult to align with frameworks such as HIPAA or NERC-CIP. We study a simple, reproducible alternative based on deterministic dual encoders and transparent fuzzy triage bands. We train a RoBERTa-base dual encoder with a 512-dimensional projection and cosine similarity on the ACORD benchmark for graded clause retrieval, then fine-tune it on a CUAD-derived binary compliance dataset. Across five random seeds (40-44) on a single NVIDIA A100 GPU, the model achieves ACORD-style retrieval performance of NDCG@5 0.38-0.42, NDCG@10 0.45-0.50, and 4-star Precision@5 about 0.37 on the test split. On CUAD-derived binary labels, it achieves AUC 0.98-0.99 and F1 0.22-0.30 depending on positive-class weighting, outperforming majority and random baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Topic Modeling · Imbalanced Data Classification Techniques
