Loading paper
Vision Transformer Based Model for Describing a Set of Images as a Story | Tomesphere