L2T-Hyena: Enhancing State-Space Models with an Adaptive Learn-to-Teach Framework
Fatemeh Sohbati, Farzan Haddadi, Hamid Salahinejad

TL;DR
This paper introduces L2T-Hyena, a hybrid state-space model with an adaptive loss function guided by a teacher network, significantly improving language modeling performance on benchmark datasets.
Contribution
It proposes a novel adaptive loss framework for state-space models using a learn-to-teach paradigm, enhancing training effectiveness and model performance.
Findings
L2T-Hyena outperforms vanilla Hyena and Transformer baselines on PTB and WikiText-103.
Adaptive loss functions improve sequence modeling accuracy.
The approach demonstrates significant gains in perplexity metrics.
Abstract
State-space models (SSMs) have recently emerged as efficient alternatives to computationally intensive architectures such as Transformers for sequence modeling. However, their training typically relies on static loss functions, which may be suboptimal at different stages of learning. In this work, we introduce a hybrid model that integrates the Hyena architecture with a Dynamic Loss Network (DLN) under a Learning-to-Teach (L2T) paradigm, referred to as L2T-DLN. In this framework, the Hyena model serves as a student whose loss function is adapted online, while a teacher model, equipped with a memory of the student's past performance, guides the DLN to dynamically trade off the primary cross-entropy objective and a regularization term. We evaluate the proposed L2T-Hyena model on the Penn Treebank (PTB) and WikiText-103 language modeling benchmarks and compare it against both a vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Machine Learning and Data Classification
