TL;DR
This paper introduces CAR, a retrieval objective focused on identifying the active authority frontier in legal and regulatory texts, with theoretical guarantees and empirical validation across multiple datasets.
Contribution
It formalizes the CAR retrieval objective, characterizes conditions for correct retrieval, and demonstrates its effectiveness with theoretical analysis and real-world experiments.
Findings
CAR retrieval achieves high accuracy on security advisories and legal datasets.
Two-stage retrieval significantly reduces incorrect
Dense TCA@5=0.270, two-stage 0.975
Abstract
In law, regulatory regimes for pharmaceuticals and software security, newer authorities can revoke older established ones even when semantically distant. We call this CAR: retrieving the currently active authority frontier for a semantic anchor q, that is, front(cl(A_k(q))). This differs from finding the most similar document by relevance score: argmax_d s(q, d). Theorem 4 characterizes when a set R truly covers the active authority set for q with TCA(R, q)=1, providing conditions necessary and sufficient for any retrieved set R: frontier inclusion (front(cl(A_k(q))) contained in R) and no-ignored-superseder (no superseding document exists in the corpus outside R). Proposition 2 shows that TCA@k <= phi(q) * R_anchor(q) in the worst case over any scope-indexed algorithm, proved by an adversarial permutation argument. We evaluated on three real-world datasets: security advisories (Dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
