Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction
Yiming Xu, Junfeng Jiao

TL;DR
This paper investigates how retrieval-augmented large language models can improve travel mode choice prediction accuracy and generalization, surpassing traditional models by integrating empirical data with advanced retrieval strategies.
Contribution
It introduces a modular RAG framework for LLMs in travel prediction and evaluates multiple retrieval strategies across different LLM architectures.
Findings
RAG significantly improves prediction accuracy.
GPT-4o with balanced retrieval and re-ranking achieves 80.8% accuracy.
LLMs outperform traditional statistical and machine learning models.
Abstract
Accurately predicting travel mode choice is essential for effective transportation planning, yet traditional statistical and machine learning models are constrained by rigid assumptions, limited contextual reasoning, and reduced generalizability. This study explores the potential of Large Language Models (LLMs) as a more flexible and context-aware approach to travel mode choice prediction, enhanced by Retrieval-Augmented Generation (RAG) to ground predictions in empirical data. We develop a modular framework for integrating RAG into LLM-based travel mode choice prediction and evaluate four retrieval strategies: basic RAG, RAG with balanced retrieval, RAG with a cross-encoder for re-ranking, and RAG with balanced retrieval and cross-encoder for re-ranking. These strategies are tested across three LLM architectures (OpenAI GPT-4o, o4-mini, and o3) to examine the interaction between model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
