Loading paper
SOLO: A Single Transformer for Scalable Vision-Language Modeling | Tomesphere