LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

Ruiyan Qi; Congding Wen; Weibo Zhou; Jiwei Li; Shangsong Liang; Lingbo Li

arXiv:2508.11280·cs.CL·August 26, 2025

LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

Ruiyan Qi, Congding Wen, Weibo Zhou, Jiwei Li, Shangsong Liang, Lingbo Li

PDF

TL;DR

This paper introduces LETToT, a label-free, expert tree-of-thought framework for evaluating large language models in tourism, demonstrating improved accuracy and scalability without relying on annotated data.

Contribution

We propose a novel, scalable, label-free evaluation method using expert reasoning structures, outperforming traditional benchmark approaches in domain-specific LLM assessment.

Findings

01

Expert ToT achieves 4.99-14.15% quality gains over baselines.

02

Scaling laws hold in tourism domain, with reasoning models closing the gap.

03

Explicit reasoning architectures outperform in accuracy and conciseness for smaller models.

Abstract

Evaluating large language models (LLMs) in specific domain like tourism remains challenging due to the prohibitive cost of annotated benchmarks and persistent issues like hallucinations. We propose $L$ able-Free $E$ valuation of LLM on $T$ ourism using Expert $T$ ree- $o$ f- $T$ hought (LETToT), a framework that leverages expert-derived reasoning structures-instead of labeled data-to access LLMs in tourism. First, we iteratively refine and validate hierarchical ToT components through alignment with generic quality dimensions and expert feedback. Results demonstrate the effectiveness of our systematically optimized expert ToT with 4.99-14.15\% relative quality gains over baselines. Second, we apply LETToT's optimized expert ToT to evaluate models of varying scales (32B-671B parameters), revealing: (1) Scaling laws persist in specialized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.