Dispatcher: A Message-Passing Approach To Language Modelling
Alberto Cetoli

TL;DR
This paper introduces the Dispatcher message-passing layer for language modeling, replacing self-attention to improve efficiency while maintaining competitive perplexity, with lower computational and memory complexity.
Contribution
It presents a novel message-passing layer that substitutes self-attention in language models, achieving efficiency gains without sacrificing performance.
Findings
Achieves O(N logN) computational complexity
Maintains comparable perplexity to existing methods
Uses O(N) memory complexity
Abstract
This paper proposes a message-passing mechanism to address language modelling. A new layer type is introduced that aims to substitute self-attention for unidirectional sequence generation tasks. The system is shown to be competitive with existing methods: Given N tokens, the computational complexity is O(N logN) and the memory complexity is O(N) under reasonable assumptions. In the end, the Dispatcher layer is seen to achieve comparable perplexity to prior results while being more efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
