Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models

Zhibo Hu; Chen Wang; Yanfeng Shu; Hye-young Paik; Liming Zhu

arXiv:2601.03388·cs.CL·January 13, 2026

Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models

Zhibo Hu, Chen Wang, Yanfeng Shu, Hye-young Paik, Liming Zhu

PDF

Open Access

TL;DR

This paper investigates how metaphors in training data influence large language models' reasoning, revealing a causal link to cross-domain misalignment and proposing a detection method based on latent feature monitoring.

Contribution

It uncovers the causal impact of metaphors on LLMs' reasoning misalignment and introduces interventions and a detector to mitigate this issue.

Findings

01

Metaphors causally increase cross-domain misalignment in LLMs.

02

Interventions on metaphors significantly alter misalignment levels.

03

A high-accuracy detector for misaligned content based on latent features was developed.

Abstract

Earlier research has shown that metaphors influence human's decision making, which raises the question of whether metaphors also influence large language models (LLMs)' reasoning pathways, considering their training data contain a large number of metaphors. In this work, we investigate the problem in the scope of the emergent misalignment problem where LLMs can generalize patterns learned from misaligned content in one domain to another domain. We discover a strong causal relationship between metaphors in training data and the misalignment degree of LLMs' reasoning contents. With interventions using metaphors in pre-training, fine-tuning and re-alignment phases, models' cross-domain misalignment degrees change significantly. As we delve deeper into the causes behind this phenomenon, we observe that there is a connection between metaphors and the activation of global and local latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Topic Modeling · Multimodal Machine Learning Applications