Rakuten Data Release: A Large-Scale and Long-Term Reviews Corpus for Hotel Domain
Yuki Nakayama, Koki Hikichi, Yun Ching Liu, Yu Hirate

TL;DR
This paper introduces a comprehensive, large-scale dataset of 7.29 million hotel reviews from Rakuten Travel spanning 16 years, enabling extensive analysis of review patterns and data drift over time.
Contribution
It provides a detailed, long-term reviews corpus with rich metadata and insights into data drift, facilitating research in hotel review analysis and temporal data studies.
Findings
Statistical analysis reveals significant data drift between 2019 and 2024.
The corpus covers diverse review aspects and user ratings over 16 years.
Insights into factors influencing review data changes over time.
Abstract
This paper presents a large-scale corpus of Rakuten Travel Reviews. Our collection contains 7.29 million customer reviews for 16 years, ranging from 2009 to 2024. Each record in the dataset contains the review text, its response from an accommodation, an anonymized reviewer ID, review date, accommodation ID, plan ID, plan title, room type, room name, purpose, accompanying group, and user ratings from six aspect categories, as well as an overall score. We present statistical information about our corpus and provide insights into factors driving data drift between 2019 and 2024 using statistical approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Marketing and Social Media · Diverse Aspects of Tourism Research · AI in Service Interactions
