Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents

Juhyun Oh; Eunsu Kim; Alice Oh

arXiv:2506.04649·cs.CL·June 6, 2025

Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents

Juhyun Oh, Eunsu Kim, Alice Oh

PDF

Open Access 1 Repo

TL;DR

Flex-TravelPlanner introduces a new benchmark for evaluating language models' ability to adapt to dynamic, multi-turn planning scenarios with competing constraints, revealing limitations in current models' flexibility and prioritization skills.

Contribution

This work presents Flex-TravelPlanner, a novel benchmark with dynamic, multi-turn planning scenarios and constraint prioritization, extending existing static planning evaluations for language models.

Findings

01

Models perform poorly on multi-turn adaptation tasks.

02

Order of constraint introduction significantly impacts performance.

03

Models often misprioritize constraints, favoring recent lower-priority ones.

Abstract

Real-world planning problems require constant adaptation to changing requirements and balancing of competing constraints. However, current benchmarks for evaluating LLMs' planning capabilities primarily focus on static, single-turn scenarios. We introduce Flex-TravelPlanner, a benchmark that evaluates language models' ability to reason flexibly in dynamic planning scenarios. Building on the TravelPlanner dataset~\citep{xie2024travelplanner}, we introduce two novel evaluation settings: (1) sequential constraint introduction across multiple turns, and (2) scenarios with explicitly prioritized competing constraints. Our analysis of GPT-4o and Llama 3.1 70B reveals several key findings: models' performance on single-turn tasks poorly predicts their ability to adapt plans across multiple turns; constraint introduction order significantly affects performance; and models struggle with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juhyunohh/flextravelbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Multimodal Machine Learning Applications · Artificial Intelligence in Games

MethodsFocus · LLaMA