Loading paper
Vision-aligned Latent Reasoning for Multi-modal Large Language Model | Tomesphere