Abundance-Aware Set Transformer for Microbiome Sample Embedding
Hyunwoo Yoo, Gail Rosen

TL;DR
This paper introduces an abundance-aware Set Transformer that weights microbiome sequence embeddings by taxa abundance, leading to improved sample representations for classification tasks.
Contribution
It presents a novel method integrating taxa abundance into Transformer-based embeddings without altering the model architecture.
Findings
Outperforms average pooling and unweighted Set Transformers
Achieves perfect classification performance in some cases
Demonstrates the importance of abundance information in microbiome embedding
Abstract
Microbiome sample representation to input into LLMs is essential for downstream tasks such as phenotype prediction and environmental classification. While prior studies have explored embedding-based representations of each microbiome sample, most rely on simple averaging over sequence embeddings, often overlooking the biological importance of taxa abundance. In this work, we propose an abundance-aware variant of the Set Transformer to construct fixed-size sample-level embeddings by weighting sequence embeddings according to their relative abundance. Without modifying the model architecture, we replicate embedding vectors proportional to their abundance and apply self-attention-based aggregation. Our method outperforms average pooling and unweighted Set Transformers on real-world microbiome classification tasks, achieving perfect performance in some cases. These results demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
