Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu; Yunhao Gou; Kai Chen; Lanqing Hong; Jiahui Gao; Fei Mi; Yu Zhang; Zhenguo Li; Xin Jiang; Qun Liu; James T. Kwok

arXiv:2405.00557·cs.CL·June 3, 2025

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

PDF

Open Access

TL;DR

MoTE introduces a novel framework combining structured reasoning chains and expert mixtures to enhance self-alignment and safety in large language models, especially smaller ones, by improving reasoning, safety, and resistance to jailbreaks.

Contribution

This work presents MoTE, a new approach that integrates multi-step reasoning with expert mixtures using step-level routing, improving model safety and alignment without complex loss functions.

Findings

01

MoTE significantly improves safety and jailbreak resistance.

02

Effective even for smaller LLMs like 7B models.

03

Achieves performance comparable to state-of-the-art models.

Abstract

As the capabilities of large language models (LLMs) continue to expand, aligning these models with human values remains a significant challenge. Recent studies show that reasoning abilities contribute significantly to model safety, while integrating Mixture-of-Experts (MoE) architectures can further enhance alignment. In this work, we address a fundamental question: How to effectively incorporate reasoning abilities and MoE architectures into self-alignment process in LLMs? We propose Mixture of insighTful Experts (MoTE), a novel framework that synergistically combines reasoning chains and expert mixtures to improve self-alignments. From a data perspective, MoTE employs a structured reasoning chain comprising four key stages: Question Analysis, Answer Guidance, Safe Answer, and Safety Checking. This approach enhances safety through multi-step reasoning and proves effective even for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Mapping