Loading paper
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice | Tomesphere