ShortListing Model: A Streamlined SimplexDiffusion for Discrete Variable Generation
Yuxuan Song, Zhe Zhang, Yu Pei, Jingjing Gong, Qiying Yu, Zheng Zhang, Mingxuan Wang, Hao Zhou, Jingjing Liu, Wei-Ying Ma

TL;DR
The paper introduces Shortlisting Model (SLM), a simplex-based diffusion approach that simplifies discrete variable generation, improving scalability and performance in biological and language modeling tasks.
Contribution
SLM is a novel simplex diffusion model that reduces complexity and incorporates flexible classifier-free guidance for better unconditional generation.
Findings
Competitive performance on DNA and protein design tasks
Effective in character-level and large-vocabulary language modeling
Demonstrates scalability and efficiency improvements
Abstract
Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM. Our code can be found at https://github.com/GenSI-THUAIR/SLM
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
