Loading paper
MTA: Multimodal Task Alignment for BEV Perception and Captioning | Tomesphere