Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

Yilong Wang; Qianli Wang; Bohao Chu; Yihong Liu; Jing Yang; Simon Ostermann

arXiv:2605.11632·cs.CL·May 13, 2026

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

Yilong Wang, Qianli Wang, Bohao Chu, Yihong Liu, Jing Yang, Simon Ostermann

PDF

TL;DR

This paper introduces Macro, a preference alignment framework using DPO to improve the validity and minimality of multilingual counterfactual explanations generated by large language models.

Contribution

Macro applies preference optimization to multilingual SCE generation, effectively balancing validity and minimality across diverse languages.

Findings

01

Macro improves validity by 12.55% on average over chain-of-thought baseline.

02

Macro outperforms translation-based baseline in minimality and validity.

03

Preference optimization enhances cross-lingual perturbation alignment and reduces errors.

Abstract

Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond English remains challenging: existing methods struggle to produce valid SCEs in non-dominant languages, and a persistent trade-off between validity and minimality undermines explanation quality. We introduce Macro, a preference alignment framework that applies Direct Preference Optimization (DPO) to multilingual SCE generation, using a composite scoring function to construct preference pairs that effectively translate the trade-off into measurable preference signals. Experiments across four LLMs and seven typologically diverse languages show that Macro improves validity by 12.55\% on average over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.