Set Interdependence Transformer: Set-to-Sequence Neural Networks for   Permutation Learning and Structure Prediction

Mateusz Jurewicz; Leon Derczynski

arXiv:2206.03720·cs.LG·June 9, 2022

Set Interdependence Transformer: Set-to-Sequence Neural Networks for Permutation Learning and Structure Prediction

Mateusz Jurewicz, Leon Derczynski

PDF

Open Access

TL;DR

The paper introduces the Set Interdependence Transformer, a neural network model that efficiently captures higher-order interactions in set-to-sequence tasks, improving performance across various domains including NLP and structure prediction.

Contribution

It proposes a novel set encoding method that enhances modeling of higher-order relations and integrates it into a complete set-to-sequence framework with state-of-the-art results.

Findings

01

Achieves state-of-the-art performance on multiple tasks

02

Demonstrates ability to generalize to unseen sequence lengths

03

Provides empirical analysis of higher-order interaction learning

Abstract

The task of learning to map an input set onto a permuted sequence of its elements is challenging for neural networks. Set-to-sequence problems occur in natural language processing, computer vision and structure prediction, where interactions between elements of large sets define the optimal output. Models must exhibit relational reasoning, handle varying cardinalities and manage combinatorial complexity. Previous attention-based methods require $n$ layers of their set transformations to explicitly represent $n$ -th order relations. Our aim is to enhance their ability to efficiently model higher-order interactions through an additional interdependence component. We propose a novel neural set encoding method called the Set Interdependence Transformer, capable of relating the set's permutation invariant representation to its elements within sets of any cardinality. We combine it with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Computational Drug Discovery Methods · Text and Document Classification Technologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Label Smoothing · Softmax · Byte Pair Encoding · Dropout · Residual Connection