Loading paper
VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models | Tomesphere