Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure   Subtype Classification

Ruimin Peng; Zhenbang Du; Changming Zhao; Jingwei Luo; Wenzhong Liu,; Xinxing Chen; Dongrui Wu

arXiv:2412.15224·eess.SP·December 23, 2024

Multi-Branch Mutual-Distillation Transformer for EEG-Based Seizure Subtype Classification

Ruimin Peng, Zhenbang Du, Changming Zhao, Jingwei Luo, Wenzhong Liu,, Xinxing Chen, Dongrui Wu

PDF

TL;DR

This paper introduces the MBMD Transformer, a novel deep learning model that effectively classifies EEG seizure subtypes from small datasets by using multi-branch encoding and mutual knowledge distillation between raw EEG and wavelet features.

Contribution

It presents the first application of knowledge distillation in EEG seizure classification and designs a multi-branch transformer architecture tailored for small data scenarios.

Findings

01

MBMD Transformer outperforms traditional machine learning methods.

02

The model achieves superior accuracy on public EEG datasets.

03

Knowledge transfer between raw EEG and wavelet features enhances classification performance.

Abstract

Cross-subject electroencephalogram (EEG) based seizure subtype classification is very important in precise epilepsy diagnostics. Deep learning is a promising solution, due to its ability to automatically extract latent patterns. However, it usually requires a large amount of training data, which may not always be available in clinical practice. This paper proposes Multi-Branch Mutual-Distillation (MBMD) Transformer for cross-subject EEG-based seizure subtype classification, which can be effectively trained from small labeled data. MBMD Transformer replaces all even-numbered encoder blocks of the vanilla Vision Transformer by our designed multi-branch encoder blocks. A mutual-distillation strategy is proposed to transfer knowledge between the raw EEG data and its wavelets of different frequency bands. Experiments on two public EEG datasets demonstrated that our proposed MBMD Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Residual Connection · Adam · Knowledge Distillation · Vision Transformer · Multi-Head Attention