TASE: Token Awareness and Structured Evaluation for Multilingual Language Models

Chenzhuo Zhao; Xinda Wang; Yue Huang; Junting Lu; Ziqian Liu

arXiv:2508.05468·cs.CL·August 8, 2025

TASE: Token Awareness and Structured Evaluation for Multilingual Language Models

Chenzhuo Zhao, Xinda Wang, Yue Huang, Junting Lu, Ziqian Liu

PDF

1 Video

TL;DR

TASE is a benchmark designed to evaluate multilingual LLMs' token-level awareness and structural reasoning, revealing current models' limitations and guiding future improvements in fine-grained language understanding.

Contribution

Introduces TASE, a comprehensive multilingual benchmark with a large dataset and synthetic data pipeline, to assess token-level and structural reasoning in LLMs.

Findings

01

Human performance exceeds LLMs on TASE tasks.

02

Current LLMs show weaknesses in token-level reasoning.

03

TASE provides insights for future model improvements.

Abstract

While large language models (LLMs) have demonstrated remarkable performance on high-level semantic tasks, they often struggle with fine-grained, token-level understanding and structural reasoning--capabilities that are essential for applications requiring precision and control. We introduce TASE, a comprehensive benchmark designed to evaluate LLMs' ability to perceive and reason about token-level information across languages. TASE covers 10 tasks under two core categories: token awareness and structural understanding, spanning Chinese, English, and Korean, with a 35,927-instance evaluation set and a scalable synthetic data generation pipeline for training. Tasks include character counting, token alignment, syntactic structure parsing, and length constraint satisfaction. We evaluate over 30 leading commercial and open-source LLMs, including O3, Claude 4, Gemini 2.5 Pro, and DeepSeek-R1,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TASE: Token Awareness and Structured Evaluation for Multilingual Language Models· underline