NAIST Academic Travelogue Dataset

Hiroki Ouchi; Hiroyuki Shindo; Shoko Wakamiya; Yuki Matsuda; Naoya Inoue; Shohei Higashiyama; Satoshi Nakamura; Taro Watanabe

arXiv:2305.11444·cs.CL·November 19, 2025·1 cites

NAIST Academic Travelogue Dataset

Hiroki Ouchi, Hiroyuki Shindo, Shoko Wakamiya, Yuki Matsuda, Naoya Inoue, Shohei Higashiyama, Satoshi Nakamura, Taro Watanabe

PDF

Open Access

TL;DR

The NAIST Academic Travelogue Dataset (ATD) is a large, publicly available Japanese travelogue corpus designed to facilitate research, improve reproducibility, and enable comparative studies in natural language processing.

Contribution

We created and released a comprehensive Japanese travelogue dataset, addressing data scarcity and supporting transparent, reproducible research in the field.

Findings

01

Over 31 million words of travelogues included

02

Provides both domestic and overseas travel narratives

03

Facilitates research reproducibility and comparison

Abstract

We have constructed NAIST Academic Travelogue Dataset (ATD) and released it free of charge for academic research. This dataset is a Japanese text dataset with a total of over 31 million words, comprising 4,672 Japanese domestic travelogues and 9,607 overseas travelogues. Before providing our dataset, there was a scarcity of widely available travelogue data for research purposes, and each researcher had to prepare their own data. This hinders the replication of existing studies and fair comparative analysis of experimental results. Our dataset enables any researchers to conduct investigation on the same data and to ensure transparency and reproducibility in research. In this paper, we describe the academic significance, characteristics, and prospects of our dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Multimodal Machine Learning Applications