Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset

Ryoma Suzuki; Zhiyang Qi; Michimasa Inaba

arXiv:2603.22913·cs.CL·April 7, 2026

Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset

Ryoma Suzuki, Zhiyang Qi, Michimasa Inaba

PDF

1 Repo

TL;DR

This paper introduces Multilingual KokoroChat, a high-quality multilingual counseling dialogue dataset created using a novel multi-LLM ensemble translation method that outperforms individual models.

Contribution

The paper presents a new multi-LLM ensemble translation approach tailored for sensitive domains, significantly improving translation quality over single-model methods.

Findings

01

Ensemble method produces preferred translations in human studies.

02

Multilingual KokoroChat dataset is publicly available.

03

Ensemble approach outperforms individual state-of-the-art LLMs.

Abstract

To address the critical scarcity of high-quality, publicly available counseling dialogue datasets, we created Multilingual KokoroChat by translating KokoroChat, a large-scale manually authored Japanese counseling corpus, into both English and Chinese. A key challenge in this process is that the optimal model for translation varies by input, making it impossible for any single model to consistently guarantee the highest quality. In a sensitive domain like counseling, where the highest possible translation fidelity is essential, relying on a single LLM is therefore insufficient. To overcome this challenge, we developed and employed a novel multi-LLM ensemble method. Our approach first generates diverse hypotheses from multiple distinct LLMs. A single LLM then produces a high-quality translation based on an analysis of the respective strengths and weaknesses of all presented hypotheses.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UEC-InabaLab/MultilingualKokoroChat
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.