Hecto: Modular Sparse Experts for Adaptive and Interpretable Reasoning
Sanskar Pandey, Ruhaan Chopra, Saad Murtaza Bhat, Ark Abhyudaya

TL;DR
Hecto introduces a modular sparse experts architecture combining different neural modules for specialized reasoning, improving interpretability and efficiency across diverse tasks without sacrificing performance.
Contribution
It presents a novel heterogeneous MoE architecture that combines a GRU and FFNN experts, demonstrating improved interpretability and specialization in reasoning tasks.
Findings
Hecto matches or exceeds homogeneous baselines in various benchmarks.
Experts specialize in distinct reasoning types, enhancing interpretability.
Performance improves at larger batch sizes due to architectural flexibility.
Abstract
Mixture-of-Experts (MoE) models enable conditional computation by routing inputs to specialized experts, but these experts rely on identical inductive biases, thus limiting representational diversity. This static computation pathway is inefficient for inputs that require different types of reasoning and limits specialization and interpretability. We propose Hecto, a lightweight MoE architecture that leverages architectural heterogeneity by combining a GRU expert for temporal reasoning and an FFNN expert for static abstraction under a sparse Top-1 gating mechanism. Evaluated on three reasoning benchmarks (AG News, SST-2, HotpotQA) and a regression task (STS-B), Hecto matches or closely trails homogeneous baselines in performance despite receiving isolated input representations, while achieving clear expert specialization, with each expert aligning to distinct reasoning types (temporal vs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
