Loading paper
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning | Tomesphere