BrainDistill: Implantable Motor Decoding with Task-Specific Knowledge Distillation
Yuhan Xie, Jinhan Liu, Xiaoyong Ni, Fei Tan, Icare Sakr, Thibault Collin, Shiqi Sun, Alejandro Rodriguez Guajardo, Demon Fanny, Charles-francois Vincent Latchoumane, Henri Lorach, Jocelyne Bloch, Gregoire Courtine, Mahsa Shoaran

TL;DR
BrainDistill introduces a power-efficient, implantable neural decoder with task-specific knowledge distillation and quantization, achieving high decoding accuracy suitable for implantable brain-computer interfaces.
Contribution
It presents a novel implantable neural decoder framework with task-specific knowledge distillation and quantization-aware training for efficient brain decoding.
Findings
IND outperforms prior neural decoders on motor tasks
TSKD improves few-shot calibration performance
Quantized IND enables low-power deployment with minimal accuracy loss
Abstract
Transformer-based neural decoders with large parameter counts, pre-trained on large-scale datasets, have recently outperformed classical machine learning models and small neural networks on brain-computer interface (BCI) tasks. However, their large parameter counts and high computational demands hinder deployment in power-constrained implantable systems. To address this challenge, we introduce BrainDistill, a novel implantable motor decoding pipeline that integrates an implantable neural decoder (IND) with a task-specific knowledge distillation (TSKD) framework. Unlike standard feature distillation methods that attempt to preserve teacher representations in full, TSKD explicitly prioritizes features critical for decoding through supervised projection. Across multiple neural datasets, IND consistently outperforms prior neural decoders on motor decoding tasks, while its TSKD-distilled…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Implantable BCIs impose unique hardware limits and the authors correctly identify power efficiency as a major bottleneck. So, the quantization analysis and power consumption estimates are relevant to practical deployment. 2. The mathematical exposition of projection-based distillation is useful for understanding of feature compression. 3. Covers three neural recording modalities (ECoG, EEG, spikes) and shows decoding performance improvements across these modalities.
1. No comparison is done against other task-oriented or projection-based distillation methods (e.g. [1-3]) under identical training conditions. 2. There is no ablation comparing IND architecture vs. TSKD itself. 3. Architecturally, the model is very similar to [1-3] and similar task-specific KD approaches and novelty of the method is under question. There is no support for the following: "However, existing KD methods primarily aim to preserve teacher embeddings as fully as possible (Miles et a
1. The motivation, i.e. bridging the gap between large neural decoders and implantable hardware, is timely and clearly articulated. TSKD addresses a concrete limitation of standard distillation (feature mismatch and capacity gap) with a principled projection-based approach. 2. The two-step projection method (supervised compression followed by fixed alignment) is well-designed. TSR provides an interpretable quantitative measure that correlates with distillation quality and offers practical diagno
1. It would be helpful to understand whether TSKD’s projections depend critically on the quality of the teacher classifier and how sensitive TSR is to teacher miscalibration. 2. The paper assumes TSR correlates with downstream accuracy, but this relationship is only shown qualitatively. Quantitative correlation plots between TSR and decoding performance across projection types would strengthen the claim. 3. The power numbers appear simulation-based rather than measured. Including hardware protot
The paper presents a solid teacher - student model. The maths behind the methodology is also well-described and has a nice flow. The fact that it goes beyond EEG to also ECoG and Spikes is also very interesting. Writing: Paper is well-written and good structured.
The main objective of the paper is not clear. Is the main purpose the distillation methodology or the IND architecture which (as the authors claimed is pretty basic) ? This should be better described. Overall: The paper shows some merits but it would be interesting to have my questions answered.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Neurological disorders and treatments · Advanced Memory and Neural Computing
