Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Xubo Lin; Zezhi Deng; Shihao Wang; Grace Hui Yang; Yang Deng

arXiv:2605.14057·cs.CL·May 18, 2026

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Xubo Lin, Zezhi Deng, Shihao Wang, Grace Hui Yang, Yang Deng

PDF

TL;DR

This paper introduces a dual hierarchical reinforcement learning framework for proactive legal dialogue agents that strategically ask questions to achieve specific objectives in high-stakes scenarios.

Contribution

It develops a novel dual RL architecture for legal inquisitive dialogue management, enabling agents to learn when and how to ask questions effectively.

Findings

01

Outperforms baseline methods on Supreme Court dataset

02

Successfully emulates judicial questioning patterns

03

Systematically uncovers crucial legal information

Abstract

Most existing dialogue systems are user-driven, primarily designed to fulfill user requests. However, in many critical real-world scenarios, a conversational agent must proactively extract information to achieve its own objectives rather than merely respond. To address this gap, we introduce Inquisitive Conversational Agents (ICAs) and develop an ICA specifically tailored to U.S. Supreme Court oral arguments. We propose a Dual Hierarchical Reinforcement Learning framework featuring two cooperating RL agents, each with its own policy, to coordinate strategic dialogue management and fine-grained utterance generation. By learning when and how to ask probing questions, the agent emulates judicial questioning patterns and systematically uncovers crucial information to fulfill its legal objectives. Evaluations on a U.S. Supreme Court dataset show that our method outperforms various baselines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.