Loading paper
Grounding Language Models to Images for Multimodal Inputs and Outputs | Tomesphere