Loading paper
ViTA: Visual-Linguistic Translation by Aligning Object Tags | Tomesphere