Loading paper
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Tomesphere