Loading paper
ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention | Tomesphere