Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue

Yangyang Zhao; Linfan Dai; Li Cai; Bowen Xing; Libo Qin

arXiv:2604.23345·cs.CL·April 28, 2026

Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue

Yangyang Zhao, Linfan Dai, Li Cai, Bowen Xing, Libo Qin

PDF

TL;DR

This paper introduces VLK-RL, a hybrid framework that combines verified large language model reasoning with reinforcement learning to improve cross-domain task-oriented dialogue performance.

Contribution

The paper proposes a novel hybrid LLM-RL framework with a verification process to enhance constraint reasoning and robustness in dialogue systems.

Findings

01

VLK-RL outperforms baseline models on long-horizon tasks.

02

The verification process reduces hallucinations and inconsistencies.

03

Structured constraints improve RL policy effectiveness.

Abstract

Cross-domain task-oriented dialogue requires reasoning over implicit and explicit feasibility constraints while planning long-horizon, multi-turn actions. Large language models (LLMs) can infer such constraints but are unreliable over long horizons, while Reinforcement learning (RL) optimizes long-horizon behavior yet cannot recover constraints from raw dialogue. Naively coupling LLMs with RL is therefore brittle: unverified or unstructured LLM outputs can corrupt state representations and misguide policy learning. Motivated by this, we propose Verified LLM-Knowledge empowered RL (VLK-RL), a hybrid framework that makes LLM-derived constraint reasoning usable for RL. VLK-RL first elicits candidate constraints with an LLM and then verifies them via a dual-role cross-examination procedure to suppress hallucinations and cross-turn inconsistencies. The verified constraints are mapped into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.