Balancing Computation Load and Representation Expressivity in Parallel Hybrid Neural Networks

Mohammad Mahdi Moradi; Walid Ahmed; Shuangyue Wen; Sudhir Mudur; Weiwei Zhang; Yang Liu

arXiv:2505.19472·cs.CL·May 29, 2025

Balancing Computation Load and Representation Expressivity in Parallel Hybrid Neural Networks

Mohammad Mahdi Moradi, Walid Ahmed, Shuangyue Wen, Sudhir Mudur, Weiwei Zhang, Yang Liu

PDF

Open Access

TL;DR

FlowHN is a novel parallel hybrid neural network architecture that balances computation load and enhances representation expressivity, leading to faster processing and improved accuracy in language modeling.

Contribution

The paper introduces FlowHN, which employs dynamic token splitting and output fusion strategies to optimize load balancing and representation in parallel hybrid networks.

Findings

01

FlowHN achieves up to 4x higher Tokens per Second.

02

FlowHN doubles Model FLOPs Utilization.

03

FlowHN outperforms existing hybrid models in accuracy.

Abstract

Attention and State-Space Models (SSMs) when combined in a hybrid network in sequence or in parallel provide complementary strengths. In a hybrid sequential pipeline they alternate between applying a transformer to the input and then feeding its output into a SSM. This results in idle periods in the individual components increasing end-to-end latency and lowering throughput caps. In the parallel hybrid architecture, the transformer operates independently in parallel with the SSM, and these pairs are cascaded, with output from one pair forming the input to the next. Two issues are (i) creating an expressive knowledge representation with the inherently divergent outputs from these separate branches, and (ii) load balancing the computation between these parallel branches, while maintaining representation fidelity. In this work we present FlowHN, a novel parallel hybrid network architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Attentive Walk-Aggregating Graph Neural Network