Loading paper
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding | Tomesphere