Beyond Case Law: Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA
Kyubyung Chae, Jewon Yeom, Jeongjae Park, Seunghyun Bae, Ijun Jang, Hyunbin Jin, Jinkwan Jang, Taesup Kim

TL;DR
This paper introduces SearchFireSafety, a benchmark for statute-centric legal QA that evaluates hierarchical retrieval and safety, revealing improvements with graph-guided retrieval and safety trade-offs in models.
Contribution
It presents a new benchmark and evaluation framework for statute-centric legal QA, emphasizing hierarchical retrieval and safety considerations.
Findings
Graph-guided retrieval improves performance in statutory QA.
Models tend to hallucinate more when statutory evidence is missing.
Benchmark evaluates both retrieval accuracy and safety in legal question answering.
Abstract
Legal QA benchmarks have predominantly focused on case law, overlooking the unique challenges of statute-centric regulatory reasoning. In statutory domains, relevant evidence is distributed across hierarchically linked documents, creating a statutory retrieval gap where conventional retrievers fail and models often hallucinate under incomplete context. We introduce SearchFireSafety, a structure- and safety-aware benchmark for statute-centric legal QA. Instantiated on fire-safety regulations as a representative case, the benchmark evaluates whether models can retrieve hierarchically fragmented evidence and safely abstain when statutory context is insufficient. SearchFireSafety adopts a dual-source evaluation framework combining real-world questions that require citation-aware retrieval and synthetic partial-context scenarios that stress-test hallucination and refusal behavior.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
