DALDALL: Data Augmentation for Lexical and Semantic Diverse in Legal Domain by leveraging LLM-Persona
Janghyeok Choi, Jaewon Lee, Sungzoon Cho

TL;DR
DALDALL introduces a persona-based data augmentation framework for legal IR, enhancing lexical and semantic diversity of synthetic queries to improve retrieval performance in low-resource legal domains.
Contribution
The paper presents DALDALL, a novel persona-driven augmentation method leveraging domain-specific personas to generate high-quality, diverse synthetic data for legal information retrieval.
Findings
Persona-based augmentation increases lexical diversity of queries.
Fine-tuned retrievers on augmented data outperform baseline models.
Semantic fidelity of generated queries is maintained.
Abstract
Data scarcity remains a persistent challenge in low-resource domains. While existing data augmentation methods leverage the generative capabilities of large language models (LLMs) to produce large volumes of synthetic data, these approaches often prioritize quantity over quality and lack domain-specific strategies. In this work, we introduce DALDALL, a persona-based data augmentation framework tailored for legal information retrieval (IR). Our method employs domain-specific professional personas--such as attorneys, prosecutors, and judges--to generate synthetic queries that exhibit substantially greater lexical and semantic diversity than vanilla prompting approaches. Experiments on the CLERC and COLIEE benchmarks demonstrate that persona-based augmentation achieves improvement in lexical diversity as measured by Self-BLEU scores, while preserving semantic fidelity to the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Law · Persona Design and Applications
