# Entropy-Based Dual-Teacher Distillation for Efficient Motor Imagery EEG Classification

**Authors:** Zefeng Xu, Zhuliang Yu

PMC · DOI: 10.3390/e28030310 · Entropy · 2026-03-10

## TL;DR

This paper introduces a new method to improve the accuracy of brain signal classification for brain-computer interfaces while keeping the system fast and efficient.

## Contribution

The novel contribution is an entropy-based dual-teacher distillation framework that reduces prediction noise and improves checkpoint stability in motor imagery EEG classification.

## Key findings

- The proposed framework achieves higher accuracy than both original models and ensembles on two public MI benchmarks.
- On BCI Competition IV-2a, the average accuracy improves from 0.7222 to 0.7713 across backbones.
- On IV-2b, the method achieves 0.8583 accuracy, outperforming original models and ensembles.

## Abstract

Motor imagery (MI) EEG classification is a key component of noninvasive brain–computer interfaces (BCIs) and often must satisfy strict latency constraints in online or edge deployments. Although ensembling can reliably improve MI decoding accuracy, its inference cost grows linearly with the number of ensemble members, making it impractical for low-latency applications. To address these issues, we propose an entropy-based dual-teacher distillation framework that transfers ensemble teacher knowledge to a single deployable backbone. From an information theoretic perspective, two failure modes are common in small and noisy MI datasets: elevated predictive entropy (noisy decisions) and large fluctuation across late training epochs (unstable convergence and unreliable checkpoint selection). Thus, we introduce an exponential moving average (EMA) teacher with entropy-gated activation as a low-pass filter in parameter space to reduce the student’s prediction noise. In addition, a two-stage cosine annealing schedule is employed to suppress late-stage oscillations and improve the robustness of final checkpoint selection. Experiments on two public MI benchmarks (BCI Competition IV-2a and IV-2b) with three representative backbones (EEGNet, ShallowConvNet, and ATCNet) under the subject dependent protocol show consistent accuracy gains over the ensemble teacher and strong distillation baselines. On IV-2a, our method achieves an average accuracy of 0.7713 across the backbones, surpassing both the original models (0.7222) and the corresponding ensembles (0.7482); on IV-2b, it achieves 0.8583 versus 0.8432 (original) and 0.8529 (ensemble).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13026056/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13026056/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC13026056/full.md

---
Source: https://tomesphere.com/paper/PMC13026056