Loading paper
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation | Tomesphere