Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation
Nabin Giri, Steven Farrell, Kristofer E. Bouchard

TL;DR
Yeti is a new, compact protein structure tokenizer that enhances multimodal protein modeling by improving codebook utilization, diversity, and generative capabilities with fewer parameters.
Contribution
Yeti introduces a simple, end-to-end trained protein structure tokenizer based on lookup-free quantization, enabling scalable multimodal protein modeling with improved diversity and generative performance.
Findings
Yeti achieves the best codebook utilization and token diversity among compared models.
Yeti attains second-best reconstruction accuracy with 10x fewer parameters than ESM3.
A multimodal model trained with Yeti generates plausible protein structures and sequences, comparable to larger models.
Abstract
Multimodal models that jointly reason over protein sequences, structures, and function annotations within a unified representation hold immense potential for integrating multimodal data and generating new proteins with designed functional properties. To utilize transformer architectures, such models require a tokenizer that converts protein structure from continuous atomic coordinates into discrete representations suitable for scalable multimodal training. The quality of such models are fundamentally upper bounded by the fidelity and expressiveness of the underlying tokenized structure. However, existing tokenizers prioritize reconstruction over generative abilities. To address these gaps, we introduce Yeti, a simple and compact protein structure tokenizer based on lookup free quantization and trained end to end with a flow matching objective for multimodal learning. Compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
