BrainDistill: Implantable Motor Decoding with Task-Specific Knowledge Distillation

Yuhan Xie; Jinhan Liu; Xiaoyong Ni; Fei Tan; Icare Sakr; Thibault Collin; Shiqi Sun; Alejandro Rodriguez Guajardo; Demon Fanny; Charles-francois Vincent Latchoumane; Henri Lorach; Jocelyne Bloch; Gregoire Courtine; Mahsa Shoaran

arXiv:2601.17625·cs.LG·January 27, 2026

BrainDistill: Implantable Motor Decoding with Task-Specific Knowledge Distillation

Yuhan Xie, Jinhan Liu, Xiaoyong Ni, Fei Tan, Icare Sakr, Thibault Collin, Shiqi Sun, Alejandro Rodriguez Guajardo, Demon Fanny, Charles-francois Vincent Latchoumane, Henri Lorach, Jocelyne Bloch, Gregoire Courtine, Mahsa Shoaran

PDF

Open Access 3 Reviews

TL;DR

BrainDistill introduces a power-efficient, implantable neural decoder with task-specific knowledge distillation and quantization, achieving high decoding accuracy suitable for implantable brain-computer interfaces.

Contribution

It presents a novel implantable neural decoder framework with task-specific knowledge distillation and quantization-aware training for efficient brain decoding.

Findings

01

IND outperforms prior neural decoders on motor tasks

02

TSKD improves few-shot calibration performance

03

Quantized IND enables low-power deployment with minimal accuracy loss

Abstract

Transformer-based neural decoders with large parameter counts, pre-trained on large-scale datasets, have recently outperformed classical machine learning models and small neural networks on brain-computer interface (BCI) tasks. However, their large parameter counts and high computational demands hinder deployment in power-constrained implantable systems. To address this challenge, we introduce BrainDistill, a novel implantable motor decoding pipeline that integrates an implantable neural decoder (IND) with a task-specific knowledge distillation (TSKD) framework. Unlike standard feature distillation methods that attempt to preserve teacher representations in full, TSKD explicitly prioritizes features critical for decoding through supervised projection. Across multiple neural datasets, IND consistently outperforms prior neural decoders on motor decoding tasks, while its TSKD-distilled…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. Implantable BCIs impose unique hardware limits and the authors correctly identify power efficiency as a major bottleneck. So, the quantization analysis and power consumption estimates are relevant to practical deployment. 2. The mathematical exposition of projection-based distillation is useful for understanding of feature compression. 3. Covers three neural recording modalities (ECoG, EEG, spikes) and shows decoding performance improvements across these modalities.

Weaknesses

1. No comparison is done against other task-oriented or projection-based distillation methods (e.g. [1-3]) under identical training conditions. 2. There is no ablation comparing IND architecture vs. TSKD itself. 3. Architecturally, the model is very similar to [1-3] and similar task-specific KD approaches and novelty of the method is under question. There is no support for the following: "However, existing KD methods primarily aim to preserve teacher embeddings as fully as possible (Miles et a

Reviewer 02Rating 6Confidence 3

Strengths

1. The motivation, i.e. bridging the gap between large neural decoders and implantable hardware, is timely and clearly articulated. TSKD addresses a concrete limitation of standard distillation (feature mismatch and capacity gap) with a principled projection-based approach. 2. The two-step projection method (supervised compression followed by fixed alignment) is well-designed. TSR provides an interpretable quantitative measure that correlates with distillation quality and offers practical diagno

Weaknesses

1. It would be helpful to understand whether TSKD’s projections depend critically on the quality of the teacher classifier and how sensitive TSR is to teacher miscalibration. 2. The paper assumes TSR correlates with downstream accuracy, but this relationship is only shown qualitatively. Quantitative correlation plots between TSR and decoding performance across projection types would strengthen the claim. 3. The power numbers appear simulation-based rather than measured. Including hardware protot

Reviewer 03Rating 6Confidence 3

Strengths

The paper presents a solid teacher - student model. The maths behind the methodology is also well-described and has a nice flow. The fact that it goes beyond EEG to also ECoG and Spikes is also very interesting. Writing: Paper is well-written and good structured.

Weaknesses

The main objective of the paper is not clear. Is the main purpose the distillation methodology or the IND architecture which (as the authors claimed is pretty basic) ? This should be better described. Overall: The paper shows some merits but it would be interesting to have my questions answered.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Neurological disorders and treatments · Advanced Memory and Neural Computing