ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang, Baizhi Chen, Si-Yu Han, Jinghao Pang, Wen-Da Wei, Guohao Cai, Zhenhua Dong, Lan-Zhe Guo, Yu-Feng Li

TL;DR
ChinaTravel introduces a comprehensive benchmark for evaluating language agents in open-ended travel planning, emphasizing compositional constraints, implicit user requirements, and neuro-symbolic approaches.
Contribution
It provides a realistic sandbox, a generalizable DSL, an open-ended dataset, and analysis demonstrating neuro-symbolic agents' potential and challenges.
Findings
Neuro-symbolic agents achieved 37.0% constraint satisfaction on human queries.
Pure neural models are less effective, with a 10x lower constraint satisfaction rate.
The benchmark highlights challenges in compositional generalization for travel planning.
Abstract
Travel planning stands out among real-world applications of \emph{Language Agents} because it couples significant practical demand with a rigorous constraint-satisfaction challenge. However, existing benchmarks primarily operate on a slot-filling paradigm, restricting agents to synthetic queries with pre-defined constraint menus, which fails to capture the open-ended nature of natural language interaction, where user requirements are compositional, diverse, and often implicitly expressed. To address this gap, we introduce \emph{ChinaTravel}, with four key contributions: 1) a practical sandbox aligned with the multi-day, multi-POI travel planning, 2) a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison 3) an open-ended dataset that integrates diverse travel requirements and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
