Loading paper
ViTOC: Vision Transformer and Object-aware Captioner | Tomesphere