TriBench-Ko: Evaluating LLM Risks in Judicial Workflows

Haesung Lee; Gyubin Choi; Eun-Ju Lee; So-Min Lee; Youkang Ko; Dogyoon Lim; Sung-Kyoung Jang; Yohan Jo

arXiv:2605.03792·cs.CL·May 6, 2026

TriBench-Ko: Evaluating LLM Risks in Judicial Workflows

Haesung Lee, Gyubin Choi, Eun-Ju Lee, So-Min Lee, Youkang Ko, Dogyoon Lim, Sung-Kyoung Jang, Yohan Jo

PDF

1 Repo

TL;DR

TriBench-Ko is a Korean benchmark designed to evaluate the risks and performance of large language models in real judicial tasks, highlighting significant challenges and areas needing caution.

Contribution

It introduces a comprehensive benchmark for assessing LLM risks in judicial workflows, covering four core legal tasks with detailed risk categories.

Findings

01

Many LLMs struggle with precedent retrieval.

02

Models often manifest significant risks like hallucination and bias.

03

Outputs frequently fail to capture critical legal information.

Abstract

Large language models (LLMs) are increasingly integrated into legal workflows. However, existing benchmarks primarily address proxy tasks, such as bar examination performance or classification, which fail to capture the performance and risks inherent in day-to-day judicial processes. To address this, we publicly release TriBench-Ko, a Korean benchmark designed to evaluate potential deployment risks of LLMs within the context of verified judicial task requirements. It covers four core tasks: jurisprudence summarization, precedent retrieval, legal issue extraction, and evidence analysis. It jointly assesses model behavior across multiple deployment risk categories, including inaccuracy (hallucination, omission, statutory misapplication), biases (demographic, overcompliance), inconsistencies (prompt sensitivity, non-determinism), and adjudicative overreach. Each item is structured to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

holi-lab/TriBench-Ko
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.