What Really Counts? Examining Step and Token Level Attribution in Multilingual CoT Reasoning

Jeremias Ferrao; Ezgi Basar; Khondoker Ittehadul Islam; Mahrokh Hassani

arXiv:2511.15886·cs.CL·November 21, 2025

What Really Counts? Examining Step and Token Level Attribution in Multilingual CoT Reasoning

Jeremias Ferrao, Ezgi Basar, Khondoker Ittehadul Islam, Mahrokh Hassani

PDF

Open Access

TL;DR

This paper examines how attribution methods reveal the reasoning process of multilingual LLMs, uncovering biases and limitations in interpretability and robustness across languages.

Contribution

It introduces a comprehensive analysis of step and token attribution in multilingual CoT reasoning, highlighting challenges in faithfulness and interpretability across languages.

Findings

01

Attribution scores focus excessively on the final reasoning step.

02

Structured CoT improves accuracy mainly in high-resource Latin-script languages.

03

Perturbations like negation reduce model accuracy and attribution coherence.

Abstract

This study investigates the attribution patterns underlying Chain-of-Thought (CoT) reasoning in multilingual LLMs. While prior works demonstrate the role of CoT prompting in improving task performance, there are concerns regarding the faithfulness and interpretability of the generated reasoning chains. To assess these properties across languages, we applied two complementary attribution methods--ContextCite for step-level attribution and Inseq for token-level attribution--to the Qwen2.5 1.5B-Instruct model using the MGSM benchmark. Our experimental results highlight key findings such as: (1) attribution scores excessively emphasize the final reasoning step, particularly in incorrect generations; (2) structured CoT prompting significantly improves accuracy primarily for high-resource Latin-script languages; and (3) controlled perturbations via negation and distractor sentences reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Natural Language Processing Techniques