A Reproducible Multi-Architecture Baseline for Token-Level Chinese Metaphor Identification under the MIPVU Framework
Yufeng Wu

TL;DR
This paper introduces a reproducible baseline for token-level Chinese metaphor identification using multiple models, compares their performance, and provides insights into model behaviors and limitations, with resources for future research.
Contribution
It systematically evaluates three model architectures for Chinese metaphor detection under the MIPVU framework and releases comprehensive resources for reproducibility.
Findings
MelBERT MIP-only achieves the highest F1 score of 0.7281.
The SPV channel in MelBERT does not improve Chinese metaphor detection.
Qwen-QLoRA generative model underperforms encoder-based models by about 11 F1 points.
Abstract
Metaphor is pervasive in everyday language, yet token-level computational identification of metaphor-related words in Chinese under the MIPVU framework remains under-explored relative to English. This paper presents a reproducible multi-architecture baseline for token-level metaphor identification on the PSU Chinese Metaphor Corpus (PSU CMC), the only widely available MIPVU-annotated Chinese corpus. We systematically compare three model families: (i) encoder fine-tuning with Chinese RoBERTa-wwm-ext-large; (ii) MelBERT adapted to Chinese using a newly constructed basic-meaning resource derived from the Modern Chinese Dictionary, 7th edition (MCD7), comprising 74,823 entries with 71.51% PSU CMC vocabulary coverage; and (iii) Qwen3.5-9B fine-tuned with QLoRA as an instruction-tuned generative baseline. Across five fixed seeds, MelBERT MIP-only achieves the strongest performance at 0.7281…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
