Scaling and Distilling Transformer Models for sEMG

Nicholas Mehlman; Jean-Christophe Gagnon-Audet; Michael Shvartsman; Kelvin Niu; Alexander H. Miller; Shagun Sodhani

arXiv:2507.22094·eess.AS·July 31, 2025

Scaling and Distilling Transformer Models for sEMG

Nicholas Mehlman, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Kelvin Niu, Alexander H. Miller, Shagun Sodhani

PDF

TL;DR

This paper demonstrates that large-scale transformer models can be effectively trained and distilled for sEMG-based human-computer interfaces, significantly improving performance and efficiency for real-time applications.

Contribution

It shows that scaling transformers up to 110M parameters improves sEMG task performance and that these large models can be distilled into much smaller models with minimal accuracy loss.

Findings

01

Scaling transformers improves cross-user sEMG performance.

02

Large models can be distilled into 50x smaller models with minimal performance loss.

03

Models up to 110M parameters outperform previous sEMG models.

Abstract

Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces by providing insights into muscular activity. However, the limited volume of training data and computational constraints during deployment have restricted the investigation of scaling up the model size for solving sEMG tasks. In this paper, we demonstrate that vanilla transformer models can be effectively scaled up on sEMG data and yield improved cross-user performance up to 110M parameters, surpassing the model size regime investigated in other sEMG research (usually <10M parameters). We show that >100M-parameter models can be effectively distilled into models 50x smaller with minimal loss of performance (<1.5% absolute). This results in efficient and expressive models suitable for complex real-time sEMG tasks in real-world environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.