TripTailor: A Real-World Benchmark for Personalized Travel Planning
Yuanzhe Shen, Kaimin Wang, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

TL;DR
TripTailor introduces a real-world benchmark dataset with extensive POI and itinerary data to evaluate and improve personalized travel planning by large language models, highlighting current limitations and challenges.
Contribution
We present TripTailor, a novel benchmark with real-world data for assessing and advancing personalized travel planning in LLMs, addressing limitations of previous simulated benchmarks.
Findings
Less than 10% of LLM-generated itineraries match human performance.
Identified challenges include feasibility, rationality, and personalization.
TripTailor dataset enables more authentic evaluation of travel planning models.
Abstract
The continuous evolution and enhanced reasoning capabilities of large language models (LLMs) have elevated their role in complex tasks, notably in travel planning, where demand for personalized, high-quality itineraries is rising. However, current benchmarks often rely on unrealistic simulated data, failing to reflect the differences between LLM-generated and real-world itineraries. Existing evaluation metrics, which primarily emphasize constraints, fall short of providing a comprehensive assessment of the overall quality of travel plans. To address these limitations, we introduce TripTailor, a benchmark designed specifically for personalized travel planning in real-world scenarios. This dataset features an extensive collection of over 500,000 real-world points of interest (POIs) and nearly 4,000 diverse travel itineraries, complete with detailed information, providing a more authentic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Data Management and Algorithms · Multimodal Machine Learning Applications
