ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents

Jie-Jing Shao; Bo-Wen Zhang; Xiao-Wen Yang; Baizhi Chen; Si-Yu Han; Jinghao Pang; Wen-Da Wei; Guohao Cai; Zhenhua Dong; Lan-Zhe Guo; Yu-Feng Li

arXiv:2412.13682·cs.AI·April 30, 2026

ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents

Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang, Baizhi Chen, Si-Yu Han, Jinghao Pang, Wen-Da Wei, Guohao Cai, Zhenhua Dong, Lan-Zhe Guo, Yu-Feng Li

PDF

2 Repos 2 Datasets 1 Video

TL;DR

ChinaTravel introduces a comprehensive benchmark for evaluating language agents in open-ended travel planning, emphasizing compositional constraints, implicit user requirements, and neuro-symbolic approaches.

Contribution

It provides a realistic sandbox, a generalizable DSL, an open-ended dataset, and analysis demonstrating neuro-symbolic agents' potential and challenges.

Findings

01

Neuro-symbolic agents achieved 37.0% constraint satisfaction on human queries.

02

Pure neural models are less effective, with a 10x lower constraint satisfaction rate.

03

The benchmark highlights challenges in compositional generalization for travel planning.

Abstract

Travel planning stands out among real-world applications of \emph{Language Agents} because it couples significant practical demand with a rigorous constraint-satisfaction challenge. However, existing benchmarks primarily operate on a slot-filling paradigm, restricting agents to synthetic queries with pre-defined constraint menus, which fails to capture the open-ended nature of natural language interaction, where user requirements are compositional, diverse, and often implicitly expressed. To address this gap, we introduce \emph{ChinaTravel}, with four key contributions: 1) a practical sandbox aligned with the multi-day, multi-POI travel planning, 2) a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison 3) an open-ended dataset that integrates diverse travel requirements and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents· slideslive