Set Interdependence Transformer: Set-to-Sequence Neural Networks for Permutation Learning and Structure Prediction
Mateusz Jurewicz, Leon Derczynski

TL;DR
The paper introduces the Set Interdependence Transformer, a neural network model that efficiently captures higher-order interactions in set-to-sequence tasks, improving performance across various domains including NLP and structure prediction.
Contribution
It proposes a novel set encoding method that enhances modeling of higher-order relations and integrates it into a complete set-to-sequence framework with state-of-the-art results.
Findings
Achieves state-of-the-art performance on multiple tasks
Demonstrates ability to generalize to unseen sequence lengths
Provides empirical analysis of higher-order interaction learning
Abstract
The task of learning to map an input set onto a permuted sequence of its elements is challenging for neural networks. Set-to-sequence problems occur in natural language processing, computer vision and structure prediction, where interactions between elements of large sets define the optimal output. Models must exhibit relational reasoning, handle varying cardinalities and manage combinatorial complexity. Previous attention-based methods require layers of their set transformations to explicitly represent -th order relations. Our aim is to enhance their ability to efficiently model higher-order interactions through an additional interdependence component. We propose a novel neural set encoding method called the Set Interdependence Transformer, capable of relating the set's permutation invariant representation to its elements within sets of any cardinality. We combine it with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Computational Drug Discovery Methods · Text and Document Classification Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Label Smoothing · Softmax · Byte Pair Encoding · Dropout · Residual Connection
