LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model
Youngjoon Jang, Chanhee Park, Hyeonseok Moon, Young-kyoung Ham, Jiwon Moon, Jinhyeon Kim, JuKyung Jung, Heuiseok Lim

TL;DR
LegalMidm is a Korean legal-domain LLM developed through a use-case-driven training framework emphasizing collaboration with legal experts and high-quality data curation for improved legal task performance.
Contribution
The paper introduces a systematic, use-case-driven training methodology for legal LLMs, specifically tailored for Korean law, with a focus on data quality and practical applicability.
Findings
LegalMidm performs effectively on key legal tasks.
Use-case-driven dataset construction improves model relevance.
Collaboration with legal professionals enhances data quality.
Abstract
In recent years, the rapid proliferation of open-source large language models (LLMs) has spurred efforts to turn general-purpose models into domain specialists. However, many domain-specialized LLMs are developed using datasets and training protocols that are not aligned with the nuanced requirements of real-world applications. In the legal domain, where precision and reliability are essential, this lack of consideration limits practical utility. In this study, we propose a systematic training framework grounded in the practical needs of the legal domain, with a focus on Korean law. We introduce LegalMidm, a Korean legal-domain LLM, and present a methodology for constructing high-quality, use-case-driven legal datasets and optimized training pipelines. Our approach emphasizes collaboration with legal professionals and rigorous data curation to ensure relevance and factual accuracy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
