The DEformer: An Order-Agnostic Distribution Estimating Transformer
Michael A. Alcorn, Anh Nguyen

TL;DR
The paper introduces the DEformer, a transformer-based model for order-agnostic distribution estimation that encodes feature identities with their values, enabling effective and flexible modeling of complex datasets without fixed feature order.
Contribution
It proposes a novel feature identity encoding method allowing transformers to perform order-agnostic density estimation without architectural modifications.
Findings
DEformer performs well on binarized-MNIST, nearing fixed-order autoregressive methods.
It outperforms recent flow-based models on tabular data.
The approach simplifies order-agnostic modeling with transformers.
Abstract
Order-agnostic autoregressive distribution (density) estimation (OADE), i.e., autoregressive distribution estimation where the features can occur in an arbitrary order, is a challenging problem in generative machine learning. Prior work on OADE has encoded feature identity by assigning each feature to a distinct fixed position in an input vector. As a result, architectures built for these inputs must strategically mask either the input or model weights to learn the various conditional distributions necessary for inferring the full joint distribution of the dataset in an order-agnostic way. In this paper, we propose an alternative approach for encoding feature identities, where each feature's identity is included alongside its value in the input. This feature identity encoding strategy allows neural architectures designed for sequential data to be applied to the OADE task without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Attention Is All You Need · Adam · Label Smoothing · Residual Connection · Dense Connections
