Loading paper
DiT4DiT: Jointly Modeling Video Dynamics and Actions for Generalizable Robot Control | Tomesphere