TL;DR
TripCraft introduces a realistic, constraint-aware travel planning dataset with new evaluation metrics, enabling better assessment of LLM-generated itineraries and advancing personalized travel planning research.
Contribution
It provides a new spatiotemporally coherent dataset with real-world constraints and continuous evaluation metrics, improving upon prior semi synthetic datasets for travel planning.
Findings
Parameter tuning improved meal scheduling scores from 61% to 80%.
TripCraft outperforms previous datasets in realism and constraint integration.
New evaluation metrics offer comprehensive itinerary quality assessment.
Abstract
Recent advancements in probing Large Language Models (LLMs) have explored their latent potential as personalized travel planning agents, yet existing benchmarks remain limited in real world applicability. Existing datasets, such as TravelPlanner and TravelPlanner+, suffer from semi synthetic data reliance, spatial inconsistencies, and a lack of key travel constraints, making them inadequate for practical itinerary generation. To address these gaps, we introduce TripCraft, a spatiotemporally coherent travel planning dataset that integrates real world constraints, including public transit schedules, event availability, diverse attraction categories, and user personas for enhanced personalization. To evaluate LLM generated plans beyond existing binary validation methods, we propose five continuous evaluation metrics, namely Temporal Meal Score, Temporal Attraction Score, Spatial Score,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsEmirates Airlines Office in Dubai · Attentive Walk-Aggregating Graph Neural Network
